


THE JOURNAL OF 
EDUCATIONAL PSYCHOLOGY 








Volume XXXIII October, 1942 Number 7 








THE NATURE OF CHANGES IN ATTITUDES OF 
COLLEGE STUDENTS TOWARD WAR OVER AN 
ELEVEN-YEAR PERIOD 


VERNON JONES 
Clark University 


PURPOSE AND PROCEDURE 


The purpose of this investigation has been to study the changes 
in attitudes toward war of comparable groups of college students over 
a long period of time, and to seek from these results better under- 
standing of the nature of such attitudes. When the study was begun 
in the peaceful year of 1930, we had not the power of prevision, of 
course, to foresee that the variables would shift so fortunately (!) 
for our investigation. But since all the original data were preserved, 
it has been possible to make many of the comparisons which we would 
have made if we had known that the United States would, during that 
time, go from peace and strong feelings of security to war and all 
the fears and dislocations that have come in its wake. 

The attitudes of all the undergraduate students in many classes 
in Clark College have been studied periodically during the eleven-year 
period by means of the Thurstone-Droba Scale and during part of this 
period by a scale constructed by the writer. The total number of 
students tested in this sample was approximately seven hundred. 
Several entire classes have been followed over their four years in 
college. In addition to the testing at Clark College, one hundred 
fifteen students in other colleges and one hundred twenty-five high- 
school juniors and seniors were measured in 1941 for purposes of 
comparison. Results will be presented here (1) in terms of “‘average”’ 
attitudes following Thurstone’s assumption of a militarism-pacifism 
continuum, and (2) in terms of more specific attitudes as revealed on 
individual items. A comparison between the changes in “average” 
attitudes and changes in specific attitudes, especially after Munich 

481 








482 The Journal of Educational Psychology 


and Pearl Harbor, will lead to suggestions concerning the nature of 
attitudes which the study of “‘average”’ attitudes does not reveal. 

In each of the Thurstone scales there are twenty-two items ranging 
from very militaristic statements to very pacifistic ones. Each 
student who is tested is simply required to check those statements 
with which he agrees and to place a cross-mark before those with 
which he disagrees. The examiner is provided with a scoring key 
giving a scaled value for each item. In deriving these scaled values, 
Thurstone assumed that a given individual’s attitude toward war 
could be placed on a single continuum ranging from militarism at 
one end to pacifism at the other. He used ‘‘competent”’ judges to 
help him determine the scaled value of each item on this supposed 
continuum. Thescaled values, as they were finally arranged, extended 
from .2, for the most militaristic item, to 10.5, for the most pacifistic. 
The student’s score on the scale is the median of the scaled scores of 
the items which he endorses. This score we shall refer to as the 
“average’’ attitude score. 


STUDENT SAMPLING 


The bulk of the subjects used in this study was from Clark College, 
a small New England liberal arts college for men. The student 
body of the college is drawn mainly from a large industrial city and 
nearby towns. The median mental ability of this group, as measured 
at time of entrance on the American Council Psychological Examina- 
tion, was between the 65th and 70th percentiles on the national norms. 
The economic and cultural background of the students, as indicated 
by occupations of fathers, was below that of the average college stu- 
dent. In order to avoid possibilities of unfair selection within our 
group, special care was taken to get all students of each class studied 
to fill out the scales. The instructions to the subjects were confined to 
explanations as to how the blanks were to be filled out and to emphasis 
upon the point that the frank expression of opinion was desired.! 
The items on the scales were at no time used for group discussion, and 
no extra copies of the scales were allowed to get out. The codperation 
of the students was excellent, and the consistency of the results bears 





1 The scales were not signed, but each student was requested to place the ques- 
tionnaire in an envelope, seal it, and write his name on the tab attached to the 
envelope. The students were assured that their names would be removed before 
the envelopes were unsealed and that they need have no fear of revealing their 
attitudes freely. 








—— -_ Fe ~~ VS 


Se = GT = 





Attitudes of College Students toward War 483 


out the general impression which we had; namely, that the scales 
were marked carefully and conscientiously with very few exceptions. 
What has been said about the codperativeness of the Clark group 
applies quite generally also to our other groups: The University of 
Nebraska group (sixty-five men and women), the Endicott College 
women (N = 50), and the high-school group. 


RESULTS 


The main findings of the study can be presented best, perhaps, by 
treating separately the results of three periods of time, during which 
the world conditions as they impinged on attitudes toward war were 
very different. First, let us study the attitudes and changes in 
attitudes during the peaceful years of 1930-1936, together with 
facts on individual variability during this period. Secondly, let us 
consider the changes between Munich and October 1941. Thirdly, 
let us look at the changes after Pearl Harbor, giving particular atten- 
tion to the items of the scale on which the greatest changes took place. 

It will be noted in the above outline that stress has been placed 
upon individual variability on the different items and upon modifica- 
tions in attitudes as revealed by separate items. This attention upon 
individual items will take on added significance as the report proceeds. 
Changes in “‘average”’ attitudes are of some interest, but these averages 
conceal much which the study of individual items has to reveal, 
especially concerning the nature of attitude changes. In each of the 
time periods data based on “‘average”’ attitudes will be presented 
first, and this will be followed by material based on the study of 
individual items. 

The Period 1930—-1936.— During this period two hundred eighty-six 
freshmen and one hundred fifteen seniors in our Clark College sampling 
were tested on Scale A of the Thurstone Scale. The mean score for 
the freshmen over that period was 6.81 (¢mean = -.05), while the mean 
score for the seniors was 7.36 (Gmean = .12). The difference of +.55 
between the seniors and the freshmen seems small in comparison with 
the total range on the scale. But this is a statistically reliable shift 
in the direction of pacifism (critical ratio = 5.00). Two entire 
classes were followed over their four years in college,! and the average 
change, based on the comparison of each man’s score as a freshman 





‘For detailed results for this period see: Jones, Vernon: “‘ Attitudes of college 
students and the changes in such attitudes during four years in college.”’” Journal 
of Educational Psychology, Vol. xxtx, 1938, pp. 14-25, 114-134. 





484 The Journal of Educational Psychology 


with his later score as a senior, was +.67. The critical ratio for this 
difference is 4.19, which again indicates a statistically reliable differ. 
ence. When it is considered that 69.5 per cent of the items endorsed 
by this group as freshmen were again endorsed as seniors, it is seep 
that rather appreciable changes must have taken place on some of the 
remaining items. 

During the years 1930-1936 the average scores of the different 
incoming freshman classes were closely similar. This was true also 
for the different senior classes. Moreover, the results which we found 
in Clark College were very similar to those reported by Carlson! 
in Chicago and by Sowards? in Kansas, the only strictly comparable 
studies available for that period. 

The ‘‘average”’ attitude of freshmen—that is, 6.81—is typified by 
endorsement of such statements as these: 


It is impossible to have a large military force without being tempted to use 


it (scale value, 7.0). 
Nations should agree not to intervene with military force in purely com- 


mercial or financial disputes (scale value, 6.3). 


From these results based on averages, one might picture the acceptance 
of such items as these, appearing in the middle ranges, as being very 
high and the acceptance of items farther out toward the extremes 
as always being much smaller. But this was not the case, as will 
be shown later. It would have been true if the spread in the scaled 
scores of the items accepted by each individual were reasonably small. 
But this spread was very large. In the case of the freshmen it was 
seventy-one per cent as large as it would have been if they had endorsed 
every statement on the scale; for the seniors it was sixty-three per 
cent as large. These figures were arrived at by arranging the scaled 
scores (corresponding to the individual statements) of each individual 
in order from high to low and then computing the interquartile range. 
This having been done for each individual, these interquartile ranges 
were averaged. For the freshmen this average interquartile range 
was 4.25; for the seniors, 3.75. The total possible interquartile range, 
if all items were endorsed, was 6.0. 





1 Carlson, H. B.: ‘‘ Attitudes of undergraduate students.” J. soc. Psychol., 
Vol. v, 1934, pp. 202-213. 

2 Sowards, G. S.: “‘A study of war attitudes of college students.” J. abn. & so¢. 
Psychol., Vol. xxrx, 1934, pp. 328-333. 

3 The r (Pearson) between individual variability and intelligence as measured 








his 
er- 
ed 
en 


nt 
so 
nd 
mn! 
dle 





Attitudes of College Students toward War 485 


This very large individual variability does not undermine the 
results based on ‘‘average”’ attitudes for the years 1930-1936, but 
it does serve as a warning that an individual’s general “average” 
position toward war and peace might be at one point on an omnibus 
scale, such as the Thurstone, while his attitude toward a specific issue 
might be quite different. This scale does not take into account the 
fact that an individual’s attitude toward war as colored by his attitude 
toward economics might be one thing, while his attitude toward war 
as looked at from the point of view of humanitarian considerations 
might be another, and his attitude toward war when looked at from 
the point of view of protection of home and country might be still 
another. 

With this warning in mind that wide individual variability was 
found in the years 1930-1936, and that this fact might be related to 
the nature of changes in attitudes as the country approached war, 
let us proceed to a study of “‘average”’ and specific attitudes in our 
second period. 

1937—October 1941.—During this period seven groups were exam- 
ined: Five in our Liberal Arts College for men, one in Endicott Junior 
College (Massachusetts) for women, and one class group (in Psy- 
chology) consisting of both men and women at the University of 
Nebraska. The results based on ‘‘average”’ attitudes are given in 
Table I. For all groups data are given on Form A and, in all except 
two, on Forms A and B combined. 





on the A.C.E. Psychological Examination was —.23 for freshmen (N = 77) and 
—.20 for the same men as seniors. The r between individual variability and 
average position on the attitude scale was —.46 for freshmen and —.47 for seniors 
(N = 77). Ther’s mean that the higher the intelligence and the higher the degree 
of pacifism the lower the individual spread on the attitude scale. 

For these and many other results on individual variability, the writer is 
indebted to David W. Lupien, Jr. whose Master’s thesis was on “ Variability in 
Attitudes of College Students.’’ Unpublished manuscript, Clark University 
Library. 

‘The two forms are definitely not comparable in average scores which they 
yield. Average scores on Form B run uniformly about 1.0 higher than those on 
A—a reliable difference. In a group of seventy-five freshmen average scores on 
Form A correlated with those on B to the extent of only .46. In a group of two 
hundred sixty-eight subjects taking both forms, it was found that fifty-two per 
cent of the items on Form A were endorsed on the average, while on B only thirty- 
eight per cent were endorsed. In light of these facts, showing that the two forms 
do not team up together very well, we have based most of our conclusions on Form 
A, which seemed the more satisfactory form, all things considered. 





486 The Journal of Educational Psychology 


It will be immediately seen in Table I that the tendency which 
was noted in the period 1930-1936 for the seniors to be significantly 
more pacifistic than the freshmen has suddenly stopped. The average 
score on Form A for Clark freshmen had dropped from 6.81 in 1930- 
1936 to 6.58 in 1941. The average score for seniors had dropped 
from 7.36 to 6.68. This change on the part of the seniors is statistically 
reliable; that for the freshmen comes a little under the requirements 
for statistical reliability, but when this difference is considered in 
combination with the consistent decrease for freshmen from 1938- 
1941, the total trend is significant, although very small in size. 


TaBLE I.—AtTTITUDES OF COLLEGE STUDENTS TOWARD PEACE AND WAR AND 
CHANGES IN SucH ATTITUDES BETWEEN SEPTEMBER 1938 AND SEPTEMBER 194] 


















































Class Date | “leer” N |Mean| SD | SDrea 
and sex 
Form A 
ee gs Oe 1938 | Men (C) 38 | 6.69 .93 12 
I goo ig eS 1938 | Men (C) 73 | 6.61 91 ll 
Freshmen................ 1940 | Women (E£) | 50 | 6.60 .89 13 
Freshmen................ 1940 | Men (C) 69 | 6.58 .83 10 
Freshmen................ 1941 | Men (C) 49 | 6.58 .92 13 
(Sept.) 
ad wie cies 1941 | Men (C) 50 | 6.68 .80 ll 
Mixed group............. 1941 | Men and 65 | 6.70 .87 All 
women (NV) 
{ 
Forms A + B 
Freshmen................| 1938 | Men (C) 73 | 7.12 
Freshmen................| 19388 | Men (C) 34 | 7.24 
Freshmen................| 1940 | Men (C) 69 | 7.03 .90 ll 
Eid. 5 wayne 4 68 aed 1941 | Men (C) 50 | 7.01 .90 13 
Mixed group............. 1941 | Men and 65 | 6.99 
women (N) 























1(C) refers to Clark College, (FZ) Endicott, and (NV) Nebraska. 


So much for the trends in attitude following Munich, as revealed 
by the averages on the Thurstone Scale. Again let us look at some 
results on individual items. This time there are data not only on the 
Thurstone Scale but also on a scale constructed in September 1940 by 
the writer to bear directly upon the problems of peace and war as they 
appeared to Americans at that time. This scale consisted of twenty- 
six items, some of which were designed to measure attitudes toward 








Attitudes of College Students toward War 487 


TasLe II.—ATTITUDES TOWARD THE WAR AS EXPRESSED BY HIGH-SCHOOL AND 
Co_LLeGEe SrupENTs, SEPTEMBER 1940 To FEBRUARY 1941 
(Results Given in Terms of Percentages Endorsing Each Item) 


_ High-school 











te: College 
juniors and 
; freshmen 
seniors 
Part I: I (am or am not) opposed to war and (do or do 
not) believe it an adequate method of solving inter- 
national problems (and or but) taking all things 
into account as the world is today:! 
1. The United States should go to war if she is 
Dh <2 ~(hbisecvessubdeand oda asanes O04 97 97 
2. The draft as preparedness measure is wise........ 96 87 


3. The United States should give all possible aid short 
of war to the British Empire in the present con- 
pe ee a pre ee err Sn ree oe 80 76 

4. Such hostile ideologies as Fascism and Nazism 
must be suppressed, by force if necessary, before 
NOY GU TI noi vc vnc cccesssceocens 87 58 

5. The United States should go into the present war ‘ 
immediately or as soon as we are adequately 
idee ni uisaa sehen ha ee ture cea e aaah 2 3 

Part II: I (grant or do not grant) the possible “‘ dangers”’ 
to America from any upset in the old balance of 
power in Europe and Asia . . . (but or and):? 

1. The United States will have lost more than it gains 
if it goes to war as a result of any condition 
developing out of the present conflict.......... 63 69 

2. The United States should refuse to go to war, re- 
gardless of any apparent losses which seem im- 
minent at the time when she is tempted to fight. 35 39 

3. The suffering of Americans resulting from a war 
would be much greater than would result from 
any non-war policy, even if it led to domination 
of the United States by some foreign power.... 16 4 

4. If a new order wins out in Europe, it must have 
some definite merits, and we should never go to 
war to keep it from spreading................ 29 10 

5. It would be better for the United States Govern- 
ment to spend its money to make democracy 
work effectively rather than spend it for national 
Rd ch ic Ck RRMA haa amaes ean ee A 20 11 

6. Young people are not really being told the most 
important things that are going on behind the 
scenes in Washington, and yet they are sup- 
posed to support the draft enthusiastically... ..' 62 60 











‘Tabulation based on students saying, “I am,” “do not,” “but,” which 
included practically all cases. 

* Tabulation based on students saying, “grant,” ‘“‘but,’’ which included 
practically all cases. 








488 The Journal of Educational Psychology 


war as colored by economic considerations, others to measure attitudes 
as colored by political considerations, others by humanitarian, others 
by idealistic, and still others by considerations of defense of home 
and country. Results will be given in Table II to show the degree of 
acceptance of a sample of the items, and the reader can be the judge 
whether the wording of items in terms of certain of these frames of 
reference seems to make any difference. 

The most striking result in Table II is the apparent inconsistency 
in attitude as the issues of war and peace are presented in different 
frames of reference. When the issue of war is presented in terms of 
“defense if attacked,” the vote for war was ninety-seven per cent 
with both high-school and college groups; but when it came to “sup- 
pression of hostile ideologies by force if necessary,” there was only 
fifty-eight per cent acceptance by college freshmen. In Items | 
and 2, Part II, there was considerable reluctance to vote in favor of 
war when the issues were stated in terms which smacked of economic 
gains and losses. Item 5, Part I, shows how weak the support was 
among high-school students and college freshmen for offensive war 
as contrasted with the very strong support for defensive war (Item 1). 
From these results it is apparent how meaningless an ‘‘average”’ 
attitude based on all such items would be in the case of an individual. 
These findings, plus more to follow, indicate clearly that there is no 
such thing as “‘an attitude”’ toward war; there are, rather, “attitudes” 
toward war. 

Turning to the results on the Thurstone Scale, we find the same 
phenomenon. Table III gives a sample of the difference in popularity 


TasBLE II].—DIFFERENCE IN THE POPULARITY OF ITEMS OF CLOSELY SIMILAR 
ScALED VALUES 
(Results Given in Terms of Percentage Endorsing the Item) 





} 
| | Percentage agreeing 











| Sealed | _ 
| | 
| value | Clark | Nebraska 
= 
1. It is our duty to serve in a defensive war... . | 4.1 87 86 
2. Because right may be more important than | 
| 4.2 38 50 


peace, war may be the lesser of two evils. 
3. Compulsory military training in all countries 
should be reduced but not eliminated... . 5.4 59 64 
4. Only in a war in which moral issues are 
clearly at stake is the individual under 
ee | 5.8 27 18 














iw eee & 





Attitudes of College Students toward War 489 


of two pairs of items in which the scaled values are closely similar. 
The results are given separately for the Clark and the Nebraska 
groups. The Clark group in this case are freshmen tested in Sep- 
tember 1940, while the Nebraska group consists mainly of juniors 
tested in February 1941. Here Item 1, which is stated in terms of 
defense, receives high acceptance, while Item 2, with almost identical 
scale value but stated in terms of such idealistic concepts as “‘rights”’ 
and “lesser of two evils,’ was accepted only about one-half as fre- 
quently. Items 3 and 4 also have rather similar scaled values but, 
sampling attitudes as they do in very different settings, they are far 
from equal in the frequency with which they are endorsed.* It is 
possible that this means that attitudes toward a complex issue like 
war are piecemeal rather than global phenomena, and that when we 
come to study changes in attitudes we might best look for modifications 
in items whose frames of reference are most affected by changes in 
world conditions. 

The final result to be presented on the 1938-1941 period has to do 
with the verbal reports of students when they were individually 
interviewed concerning the apparent inconsistencies in their attitudes 
toward war. Ejighty-one students chosen at random were interviewed 
by a graduate student working under the writer’s supervision in 1939. 
This study revealed two main facts:* (1) That there were very great 
differences in the popularity of items at every level on the scale, and 
(2) that each student had ready “‘explanations”’ to offer in every case 
as to why he endorsed one statement and not another. If, for example, 
a student endorsed an item with a scaled score of 4.1, failed to endorse 


1 One might, of course, say that this result shows only that the standardization 
of the scale was inadequate, or that a standardization made in the late 1920’s was 
inadequate for use in 1940. But we can only say that the scale was carefully 
standardized by Thurstone at that time, and if a scale standardized for measuring 
attitudes at one time is not adequate for use ten or twelve years later, that is an 
important fact to know about the nature of attitudes and the problem of their 
measurement. 

? Since these are but samples of popular and unpopular items, one might well 
ask about the degree to which popular and unpopular items remained so over 
several years. In other words, if one were to arrange the items on a scale in order 
of popularity from highest to lowest in one group of subjects and do the same in 
another group, what would be the correlation between the two series? This was 
done in 1939 with two rather small samples, and an r of .97 (rank method) was 
found for Form A and .94 for Form B. 

’ For these results the writer is indebted to Elinor Brown: ‘‘A Study of the 
Inconsistencies in the Attitude toward War among College Students.”’ Master’s 
thesis, 1939, on file at Clark University Library. 








490 The Journal of Educational Psychology 


one of 4.2, and then endorsed one at 9.0, he would readily defend this 
as being consistent from his point of view. The answers given by the 
students as to why they endorsed certain statements and did not 
endorse others seemed definitely to be more than mere rationalizing 
or defending their previous responses. 

The most frequent ‘‘explanations”’ given for checking items which 
were distant from the individual “average” position on the scale, or 
otherwise inconsistent with other items checked, were: (1) That the 
items simply stated a fact, (2) that it was a statement of an ideal or 
humanitarian principle and as such was nothing which one could 
refuse to endorse, (3) that it agreed with the subject’s political or 
economic viewpoints, and (4) that it was an innocuous, toothless 
statement that was accepted because there was no particular reason 
not to. These are given in descending order of frequency. The 
reasons given for not endorsing items which were near the student’s 
“‘average’’ position were, in descending order of frequency, as follows: 
(1) That the statement was vague and lacking in sufficient qualifica- 
tions to make sense, (2) that it was contrary to the subject’s economic 
or political views, (3) that it contained poorly defined catch-phrases, 
and (4) that it was contrary to fact. 

Thus, again, we see that the attitudes which are being measured on 
the Thurstone Attitude Toward War Scale are by no means homo- 
geneous. Some statements are endorsed or rejected from one view- 
point or frame of reference, while others are endorsed or rejected from 
another. The question which persistently arises in this connection 
is whether or not changes in attitudes might not be very great in one 
of these frames of reference, and possibly affect specific lines of action 
very markedly, without showing up in the other frames and without 
making much impression on so-called ‘average’? attitudes. This 
question should be kept in mind as the evidence on the final period, 
that after Pearl Harbor, is examined. 

After Pearl Harbor.—Between December 15 and 20, 1941, all the 
freshmen and seniors in Clark College were measured on the attitude 
scales. In the case of the freshmen, a matching of each man’s post- 
Pearl Harbor attitudes with his September, 1941, attitudes was used 
(N = 64); with the seniors the matching was with February 1941 
attitudes (V = 37). On “average” attitudes, as measured on the 
Thurstone Scale, the mean change was zero in the freshman group 
and +.32 (toward militarism) in the senior group. The smallness 
of this shift after so drastic a change in national affairs would seem 
incredible were it not for the clue which has been gradually developing. 








rt —-= DO mw me 8 


2 





Attitudes of College Students toward War 491 


The smallness of this difference did not jibe with the results of casual 
observation of these students. Moreover, when they were asked at 
the end of the testing period to report on their papers if they thought 
their attitudes toward war and peace had notably changed in recent 
months or years, sixty-one per cent of the seniors and seventy-seven 
per cent of the freshmen answered in the affirmative; sixty-seven per 
cent of those so answering assigned to first place, among the factors 
contributing to such a change, the Japanese attack on Pearl Harbor. 

A result more in keeping with expectations than that based on 
“average”’ attitudes is givenin Table IV. Here we have selected from 


TaBLE IV.—A COMPARISON BETWEEN SEPTEMBER AND Post-PEARL HARBOR 
ACCEPTANCE OF CERTAIN ITEMS AMONG COLLEGE FRESHMEN 
(Frequencies Given in Percentages) 





Freshmen 





Sept. | Dec. 17-20 








Part [I 
1. The U. 8. should go to war if she is attacked........ 95 100 
4. Such hostile ideologies as Fascism and Nazism must be 

suppressed, by force if necessary, before democracy 
ce cass cecewennniveace nen 62 82 

Part II! 

1. The U.S. will have lost more than it gains if it goes to 
war as a result of any condition developing out of 
es eins ee Cee ee ee enews 71 46 

2. The U.S. should refuse to go to war, regardless of any 
apparent losses which seem imminent at the time 
when she is tempted to fight..................... 38 10 











1 See Table II for a statement of the covering statement. 


the Jones Scale a sample of items for which the frame of reference is 
defense or safety of home, country, and the American way of life. 
In this table it will be seen that there is a large change in the direction 
of militarism in every item. A few similar items could be drawn from 
the Thurstone Scale with similar changes. For example, the item, 
“Tt is our duty to serve in a defensive war,’”’ was endorsed by ninety- 
seven per cent of the seniors after Pearl Harbor, whereas, in September 
1940, eighty-five per cent accepted it, and in 1934 only sixty-five per 
cent. On items which measured attitudes toward war within such 
frames of reference as the humanitarian, the idealistic, the factual, 
or the economic, the changes were small or negligible. This is easy 





492 The Journal of Educational Psychology 


to believe because it seems plausible that the attack on Pearl Harbor 
and the subsequent declarations of war would not have affected very 
much these last-named frames of reference and, therefore, any attitudes 
toward war or peace which were looked at within these frames of 
reference would show small changes. The frame of reference which 
was most directly affected by the events of Pearl Harbor was that 
involving safety and security of home, country, and the democratic way 
of life. When American territory was attacked and war was declared 
against us, this safety and security were called into question, and with 
the changing of this frame of reference the attitudes within that frame 
changed. The modification of attitudes within this frame of reference 
did not have to wait for changes to take place in all other frames of 
reference. Indeed, as everyone knows, the attitudes of opposition to 
war and its destruction of life and property can exist simultaneously 
with a determination to blast the enemy and all that he owns until he is 
removed as a threat to one’s feeling of safety and security. Reports 
from England following the devastating air raid on Cologne were to 
the effect that instead of rejoicing among the fliers there was quiet 
and confidence mingled with a little remorse that so inhumane a thing 
had to be done. ‘It seemed a pity,” said one of the pilots who had 
dumped a ton or so of explosives. This is equivalent to saying that 
his attitude toward war in a humanitarian frame of reference was 
different from that in the safety and security frame. From the 
humanitarian viewpoint it seemed a pity. 

But let no one think that the raiding by the British or any of the 
United Nations will stop on this account. Nothing that the enemy 
can do by propaganda methods to play upon this humanitarian frame 
of reference (which earlier he mistook for softness) will change to his 
advantage the frame of reference which is now in the ascendancy— 
that is, nothing until the threat to the United Nations’ feeling of 
security of home and country has been removed. Once the enemy is 
vanquished and the threat to security is removed, there is reason to 
believe that the peace will be very different from what the Axis would 
enforce if they were victorious. These humanitarian and idealistic 
frames of reference will, in the case of democracies, still be alive when 
the war is over (although they may be different from now), and these 
may influence in very important ways the attitudes and actions at the 
peace tables and thereafter. 

At least this is the way it looks from the study of student attitudes. 
The young people are fighting this war with mental attitudes of 








CD 





Attitudes of College Students toward War 493 


hostility in certain frames of reference. They are determined to 
vanquish the enemy as a threat to their own way of life. After he is 
defeated and is no longer a threat to their security, they will be willing, 
unless their frustrations greatly increase and their education goes 
awry, to help him build back in every way, short of becoming a threat 
to safety and security all over again. 


SUMMARY AND CONCLUSIONS 


The main point coming out of this investigation is that attitude 
toward war within one frame of reference may change drastically 
without corresponding changes in attitudes in other frames of reference. 
The possibility that such compartmental changes could take place 
was demonstrated in the first two periods covered in this article by 
the fact that each individual tested swung widely up and down the 
so-called militarism-pacifism scale as he marked the different items. 
That is, the individual’s attitudes, as expressed on the specific items, 
seemed to be quite independent of his “‘average”’ attitude. After 
Pearl Harbor what had seemed a possibility became a certainty. 
Striking changes took place on a few items, while little or no changes 
took place on others. The relatively insensitive items vastly out- 
numbered the sensitive ones, and the change in ‘‘average”’ attitudes 
was very small, ranging from zero in one class to .32 in another. 
The sensitive items were found to be those worded in such a way as to 
come within the frame of reference of defense, safety, and security 
of home, country, and the democratic way of life. What happened 
at Pearl Harbor and the declarations of war following impinged 
directly upon this frame of reference. Such frames of reference, 
however, as the humanitarian, the idealistic, and the economic were 
not appreciably affected, and the items within these frames did not 
change. 

The significance of this point for the understanding of popular 
attitudes toward war and for psychological and educational planning 
seems to be three-fold: 

(1) It has a bearing on the question as to what American youth 
are fighting for. Reference has already been made to the words of 
one of the young British airmen who helped to bomb Cologne. It is 
not merely superficial to say that the youth of America are fighting 
for victory. Within the frame of reference of safety and security of 
home, country, and a way of life which they think is worth defending, 
they have come to the unalterable attitude that here is a job to be done, 








494 The Journal of Educational Psychology 


a threat to be removed, an enemy to be vanquished. Within the 
humanitarian and the idealistic frames of reference they hate war and 
have little confidence in it alone as a method of solving international] 
disputes. But they hate the threat to their security and way of life 
more, and they are fighting for victory. It does not weaken their 
determination or efficiency in war to be opposed to war in certain 
frames of reference. Indeed, it may help their determination and 
effort, for they may gradually integrate their attitudes of hostility 
against aggressor nations, with humanitarian and idealistic considera- 
tions to the effect that they are fighting for a victory which will not 
permit such aggressor tactics in the future. 

(2) It has bearing upon the possible resurgence of conflicting 
attitudes after the war is won. With attitudes within one frame of 
reference being so largely in the ascendancy today, there is an apparent 
unanimity of purpose in the war effort which conceals varieties of 
attitudes toward war and peace. These may reappear when the 
victory is achieved. Attitudes toward war and peace within the 
economic frame of reference, for example, may, if not prepared for 
far ahead of time, lead to much bitterness after the international 
victory is won. With the cessation of international hostilities will 
come a taking of stock of the sacrifices, the losses, and the gains in the 
war effort. It is then that the varieties of attitudes as to what we 
were fighting for may seem greater and more of a challenge to leader- 
ship than today. 

(3) It has bearing on education and propaganda. It is both a 
danger and a challenge when attitudes within one frame of reference 
can be changed and translated into action by a short-circuiting process, 
so to speak, without the mediation of other relevant attitudes which it 
has taken years to develop. It is a danger because propaganda, and 
other devices besides education and reason, may quickly bring about 
changes in attitudes and actions which are socially undesirable. Itisa 
challenge because attitudes within a given frame of reference can be 
modified to meet changing problems in society without the inertia 
involved in changing all attitudes which might directly or indirectly 
be related to the issue. For education to keep ahead of propaganda 
and desires of many for special privileges will call for eternal vigilance 
on the part of the schools, colleges, and other agencies dedicated to 
truth and human justice. The race for the control of attitudes will 
be between education and humanitarian values on the one hand, and 
propaganda of pressure groups on the other. 








THE STUMP AUDITORY GROUP TESTS 
OF INTELLIGENCE 


N. FRANKLIN STUMP 
Department of Psychology and Education, Keuka College, Keuka Park, N. Y. 


NATURE OF THE AUDITORY GROUP TESTS 


The purpose of this article is to describe a new type test of intelli- 
gence on which the author has been working for a number of years. 
The nature and value of these Auditory Group Tests are presented. 
Reliability coefficients, validity measures, age norms, and percentiles 
have been calculated for the tests. 

The Stump Auditory Group Tests measure functions in which the 
school and society are particularly interested, 7.e., the ability to under- 
stand and reason by means of the spoken word, and the capacity to 
listen keenly. Both factors are important in the interpretation of 
what is heard. Thus, the development of this type of examination 
was motivated by its practical values. 

The Auditory Tests require a different procedure in the presenta- 
tion of items to children than is used in the administration of printed 
intelligence examinations. An entirely new set of items was tried out 
for the tests, since items which have been evaluated by requiring 
children to read them cannot be used in an auditory test without 
making undesirable assumptions. So, in the experimental period 
the items were presented orally by the author, after which the children 
reacted by writing their responses on mimeographed blanks. In order 
to insure uniformity of presentation the items which met significant 
criteria for the measurement of intelligence were then recorded by 
electrical transcription and were presented for standardization by 
means of the phonograph. 

Forms A and B of the Stump Auditory Tests of Intelligence have 
been prepared. 


VALUES AND SPECIAL FEATURES OF THE TESTS 


Results Secured by the Current Intelligence Tests are Influenced by 
Reading Ability —There are some children whose intelligence ratings 
are affected by poor reading ability. A bright child who has not 
learned to read may be penalized by current tests of intelligence which 
place a premium upon this skill. For this reason it is of practical 

495 








496 The Journal of Educational Psychology 


concern that reading handicaps be eliminated, or that a measure of 
general intelligence be applied which’ does not stress these skill defi- 
ciencies, before arriving at a judgment of a person’s mental level.’ 

Many psychologists believe that reading comprehension is a 
criterion of intelligence. The author is inclined to agree with this 
viewpoint. However, the purpose in constructing the tests was to 
insure that a child is not handicapped by reading disabilities or by 
lack of opportunity to learn to read.’ 

It is a well-known fact that the Binet tests of intelligence are rela- 
tively independent of the factor of reading. Whether Binet con- 
sciously sought to eliminate the effects of reading is not known; but 
when the latest edition of the Stanford Revision of the Binet examina- 
tion is considered, only four tests of the one hundred twenty-nine 
(alternates included) in each of the two Forms (Z and M) actually 
require reading ability. Thus, ninety-seven per cent of these tests 
might be regarded as listening or auditory tests. 

Coachability of the Auditory Tests Can Be More Carefully Controlled 
than in the Case of the Printed Tests.—The group Auditory Tests 
presented to the children by means of electrical transcription can be 
freed of the possibility of coaching to a much greater extent than 
the printed intelligence tests, copies of which can be obtained with 
little effort; many of them having been published in great part in 
books and magazines. Thus, it is more difficult for children to become 
“test wise’ on the auditory type of measure.® 

It is the intent of the author that the Auditory Tests of Intelligence 
be sold only to competent individuals who are entirely trustworthy 
in the use of the materials. These persons will realize that the tests 
are tools of investigation, that they form a basis of important experi- 
ments, and they furnish the data for significant theory and practice. 
It is the hope that the materials contained in them will be scrupu- 
lously safeguarded. 

Listening (Auditory) Ability Plays a Major Réle in Communica- 
tion.—Occasionally the attention of test-makers has been directed to 
the importance of measuring abilities which ought to be developed 
in individuals. Listening ability should play a larger réle in our 
testing programs because of its wide use in life situations. Research 
indicates that when data of the various types of communicative 
activity are studied, in order to determine the relative importance of 
these abilities in terms of the frequency of use in life, listening occurs 
almost three times as much as reading.® 








f 


The Stump Auditory Group Tests of Intelligence 497 


The Auditory Tests Are More Economical than the Current Forms 
of Group Intelligence Measures.—It can be easily demonstrated that 
an Auditory Test is economical, since the same tests (on phonograph 
records) can be used for testing thousands of children. Mimeographed 
sheets on which children write their responses are the only new mate- 
rials required at each administration of the tests. Besides cost, other 
factors, considered important when the choice of a test is made, were 
considered: The consumption of time for administration; and its value 
as a test. If materials have been provided in advance, forty-five 
minutes seem sufficient for administering each form of this test. Its 
value as a test has been made clear in other parts of this paper. 

The Time Factor per Unit of Work Is Held Constant for All Children. 
The factor of speed, emphasized in many of our intelligence tests, 
has been greatly criticized. In this Auditory measure every pupil 
works on the same problem at the same time. In this regard the 
tests offer advantages beyond the printed ones. On the printed test 
the child who works very slowly is often penalized for his thoroughness, 
while the child who works rapidly frequently increases his score 
because he attempts many more problems than the slow child. The 
Auditory Tests are largely independent of the speed factor, since 
every item can be attempted by every child. This consequently 
overcomes one of the greatest criticisms of many group intelligence 
tests.*:® 

In the Auditory Tests the Defects of Alternative Answers Are Elimi- 
nated.—The recording of answers in the tests calls for an amount of 
creative work on the part of the children. In place of suggested 
answers, as in the alternative type of test, the subject must use the 
materials which are auditorily presented to him, and then seek a 
response as an end result. This may require him to draw upon a 
large arc in the cycle of experience, or it may require only going through 
a small are of his experience in arriving at the end result. This is 
what happens in the completion tests and for this reason they have 
been widely accepted in the measurement of intelligence.*:!° 

The Auditory Tests Should Insure Uniformity of Administration 
in Different Situations—Many standardized tests are administered 
often in an extremely slovenly manner. Sometimes the examiner 
fails to give or clarify the exact instructions which accompany the 
tests. Oftentimes examiners do not fully realize that instructions 
which are given the subjects may completely invalidate the tests. 
Thus, the method of administering the Auditory Tests, by the phono- 








498 The Journal of Educational Psychology 


graph method, should result in greater reliability because of increased 
uniformity in the techniques of administration.? 

The Auditory Tests Maintain a Higher Level of Attention than the 
Orthodox Examinations.—Experience of the author in administering 
the Auditory Tests indicates, beyond doubt, that there is a constant 
stimulation of the children. It is believed that the auditory factor 
helps to reduce fatigue, satiation, or boredom because auditory stimu- 
lation and experiences are dynamic. Thus, the constant stimulation 
which children receive from these tests is naturally desirable since 
any factor which motivates children to work to capacity during the 
administration of any test will tend to give a truer score of the indi- 
vidual’s latent potentialities. ! 


RELIABILITY OF THE TESTS 


The reliability measures for the Auditory Group Tests of Intelli- 
gence were derived under three conditions, obtaining reactions from 
seven hundred thirteen children. An analysis of the relationships 
between Form A and Form B for a group of three hundred seventy-six 
rural-school children in the second through the eighth grade; a group 
of one hundred thirty-five centralized-school children in the second 
through the tenth grade; and a group of two hundred two city-school 
children in the fifth, sixth, and eighth grades. The rural children 
were enrolled in one- and two-room schools. Many of the children 
in the centralized school were reared in the country but were receiving 
instruction in adepartmentalized school. Thecity-school children were 
enrolled in five schools located in a town of thirty thousand population. 

Reliability Not Affected by Acoustical Factors—The Auditory Tests 
were administered in one- and two-room schools to three hundred 
seventy-six rural children. Many of these buildings lacked electrical 
appliances, so it was necessary to use an old-fashioned type of phono- 
graph. The acoustical properties of the rooms were not as adequate 
as those found in centralized and city schools. In contrast to this 
situation, the tests were administered in the centralized and city 
schools by means of electrical amplification. In spite of these appar- 
ent variations of equipment and acoustical properties, it is noted that 
the reliability coefficients for the total tests are nearly as high for the 
rural schools as for the centralized and city schools. 

The effect of acoustical properties on the results of the tests is 
apparently not as serious as some persons might suspect. At least, 
the reliability coefficients are entirely satisfactory. 








The Stump Auditory Group Tests of Intelligence 499 


It might be concluded that for children without serious auditory 
defects, the test results seem quite reliable, when given even under 
only fairly favorable classroom conditions. The author of the Audi- 
tory tests administered or supervised the administration of these 


TasLE I.—RELIABILITY COEFFICIENTS (CORRECTED BY THE SPEARMAN-BROWN 
FoRMULA) AND INTERCORRELATIONS ON SECTIONS OF THE Stump AvupITORY GROUP 
Tests OF INTELLIGENCE FOR 376 RURAL-SCHOOL CHILDREN; 

135 CENTRALIZED-SCHOOL CHILDREN; AND 202 CiTy-scHOOL 
CHILDREN—ForM A AND Form B* 






































Form A 
| | 

| Infor- | Arith- | Analo- | Oppo- 
Total mation| metic | gies | sites 

— = 
Pitel.......csnsscsccsee MOMOl 28 .80 .80 81 | 84 
N135| .92 80 82 81 | 86 
N202| 91 88 74 89 80 
a N 376 .82 .82 .86 .67 75 
N135| .87 87 73 77 82 
N202| .81 94 75 73 | .84 
© Arithmetic............... N376| .80 | .65 | .86¢ | .63 | .60 
E N135| .86 | .71 | .91 | .71 | .78 
& N202| .74 .70 87 .60 | 61 
Analogies................ N376| .77 | .69 | .65 | .86 | 70 
N135| .98 70 .73 90 | .77 
N202| .76 71 61 84 | .72 
Opposites................ n376| .92 | .72 | .71 | .67 | .88 
N135| .85 71 64 85 | .90 
N202| .77 .80 64 69 | .83 





* Figures in italics are reliability coefficients. 


tests, so he is fully aware of the varying acoustical conditions of the 
classrooms. Thus, codperation of the children, the novelty of the 
situation, and other factors mentioned above seemed to compensate 
for manifest variations of acoustical conditions. 


VALIDITY 


Validity is an essential aspect of any test. An adequate descrip- 
tion of validity is more difficult with a new type examination, such 





Toten 
< 





500 The Journal of Educational Psychology 


as an auditory test, than with the widely used current tests. When 
children are required to read items, direct comparisons cannot be 
made with tests which are presented by the auditory procedure. For 
this reason, during the experimental period each item was subjected 
to very careful scrutiny, and five criteria had to be satisfactorily 
met before items were considered acceptable for the final tests. ll 
these factors also contribute to the high reliability of these measures. 

The following criteria were applied to each item: 

(1) Suitability for auditory presentation on the basis of previous 
research, and that of the author. Attention was given to (a) the 
contribution research has shown various typical items to have con- 
tributed to the measurement of intelligence; (b) the adaptability of 
the items for auditory presentation, to insure objectivity in scoring. 

(2) Selection according to difficulty. By this procedure it was 
possible to arrange the test items in order of difficulty and to space 
them at definite points along a scale of difficulty. The distance in 
difficulty from one item to another was made equal throughout each 
of the four sections of the test, being separated by .10¢ value. 

(3) Selection on the basis of the increase in achievement from 
age toage. It has been amply demonstrated that there is a continuous 
growth with age of those mental functions which are commonly 
required as significant in intelligent behavior. At least, this growth 
continues until the attainment of a maximum mental level, rate and 
maximum depending, of course, upon the individual. Since it is 
known that intelligence is to a certain extent a function of age, a test 
to be valid must show an increase from year to year in the percentage 
of unselected children that pass it. Terman says, ‘‘This is the 
criterion on which Binet chiefly relied. Nearly all of the tests which 
he finally included in his scale satisfy this criterion fairly well, though 
some show a more rapid increase than others.’’™! 

(4) Treatment from the standpoint of the criterion of coherence. 
This criterion aims to separate the influence of intelligence from that 
of age. It consists of studying the responses of a group of unselected 
children of one chronological age and finding the percentage of success- 
ful responses separately for the bright, average and dull of that group. 
For an item to be selected, the bright children had to exceed the 
average children in percentage of correct responses; the average 
children had to excell the dull. In this manner the goodness of each 
item was determined by its comparison with the entire test. Thus, 
each item was retained for the final tests or eliminated with respect 
to the validity of the entire test. This criterion was applied to every 








a a. ae Oe 





The Stump Auditory Group Tests of Intelligence 501 


unselected age group in which tests were finally to be used, 7.e., eight 
through fourteen years. 

(5) Study of the reactions of older dull children and younger 
bright children, with the purpose of removing the effect of training. 
Each item was studied with regard to the comparative success of a 
group of chronologically young, but mentally bright children, and a 
group of children mentally dull but chronologically old. This criterion 
guards against including materials in an intelligence test which are 
greatly influenced by the experience factor. It is conceivable that a 
consideration of the longer period of exposure to environmental and 
training influences of the older and dull children, as compared to the 
shorter period of the younger and bright children, is profitable in the 
selection of items. 

(6) Comparison of test results with teacher’s ratings. Teachers 
were requested to judge children who took the tests with regard to 
three characteristics: (a) Quality of school work; (6) Intelligence; 
(c) Social status. The Pearson correlations (based on two hundred 
children) between these factors and the scores on the Auditory Tests 
compare favorably with those obtained by Terman, when the first 
Stanford Revision of the Binet Tests was related to these same factors. 
The correlation between scores on the final forms of the Auditory 
Tests and the quality of school work, as judged by the teachers, is 
+.54; between these tests and intelligence as estimated by the teachers 
+.58; between these tests and social status +.44. For the same 
relationships Terman reports the following correlations for the Binet 
test: +.45; +.48; and +.40, respectively. 

Final Selection of Items for Form A and Form B.—During the 
experimental period, four tests (two hundred sixteen items in each)— 
Information, Arithmetic, Analogies, and Opposites—were adminis- 
tered orally to seven hundred fifty children ranging from eight to 
fourteen years of age. For these composite experimental items a 
correlation of +.74 was obtained with intelligence quotients from 
Terman’s Mental Ability Test (one hundred unselected fourteen-year 
old children). A coefficient of +.80 was found between the total 
auditory scores and the Otis Self-Administering Test of Mental 
Ability (thirty-five unselected fourteen-year old children). For the 
former group, the intercorrelations for various sections of the test 
range from +.59 to +.90; for the latter group from +.68 to +.92. 

In the final tests each Form has a total of one hundred thirty-four 
items; 7.e., Information, thirty items; Arithmetic, thirty; Analogies, 
thirty-five; and Opposites, thirty-nine. 





502 The Journal of Educational Psychology 


Age Norms.—The norms (based on seven hundred forty cases) 
for Form A and Form B of the Auditory Tests for the various ages 
may be taken from Tables II and III. For example, a child who 
earns a score of 33 has a mental age of ten years and five months. 

The A and B forms were designed in parallel, almost item for item, 
but it seems desirable to supply a norm table for each Form. This 
procedure eliminates any assumption that the forms will coincide 
in difficulty over all ranges of performance. 


TaBLE IJ.—AcGce Norms ror THE Stump AvupiTrory TEst or INTELLIGENCE— 


















































Form A 
N = 740 
Years 7 8 9 10 11 12 13 14 15 16 
0 71 17] 29] 37| 44] 51! 57]| 631] 69 
1 7/1 18! 30| 38! 45] 52! 58! 63] 70 
2 8} 19| 31! 39) 46] 53! 58] 63] 70 
3 8| 21; 31] 39| 46] 53) 59! 64] 71 
ws 4 9| 22} 32) 40) 47) 54] 59/| 64] 72 
= 5 ..| 9] 23] 33] 40] 47] 54| 60] 65] 72 
iS) 6 5 9| 24] 34/ 41] 48! 55| 60] 65] 73 
a 7 5 | 11| 25] 34] 42) 43 | 55| 60| 66 
8 5 12/ 26|} 35| 42/ 49) 56! 61! 66 
9 6 13} 26/ 36| 43| 50; 56/ 61] 67 
10 6 14] 27/ 36] 43] 50| 56! 62! 68 
11 6 16| 28! 37] 44] 51 57 62 68 
| ! 
TasBLE III.—AGs Norms ror THE Stump Aupirory Group TEsT oF 
INTELLIGENCE—FormM B 
N = 740 
| 
Years 7 8 9} 10] 11] 12 | 13; 14] 15] 16 
0 7| 17| 29! 39| 48] 56! 64! 71| 78 
1 8} 18| 29) 40! 49! 57/| 65] 72] 78 
2 8§| 19| 30] 41] 50! 58| 65| 72! 79 
3 9| 20] 31] 42] 50! 58] 66!] 73! 80 
3 4 9] 22] 32/ 43) 51| 59| 66] 73] 80 
= 5 a 9/} 23| 33/ 44/] 51] 60| 67) 74! 81 
3 6 5 10| 24! 33] 45! 52! 61] 68| 74| 81 
a 7 6 11| 25] 34] 46] 52! 61! 68! 75 
8 6 12} 26/ 35| 46] 53! 62/ 69] 76 
9 6 13| 26] 36] 47| 54! 62! 69! 76 
10 7 14| 27| 37] 47] 55! 63! 70| 77 
11 7 16} 28] 38] 48] 55! 64] 70| 77 






































The Stump Auditory Group Tests of Intelligence 





503 


Percentiles—Some users of tests desire to interpret results in 
Tables IV and V give deciles for Form A and 


terms of centiles. 


Form B, respectively, for ages nine to fourteen. 


As the Tests are 
used more widely, extension will be made of the age norms and centiles. 


TasBLE IV.—DeEcILEs ON Form A oF THE AupbITORY TEsT FOR AGE GROUPS 


RANGING FROM 9 TO 14 YEARS 




















N = 559 
| Ages 
Deciles » 

IX x XI XII XIII IX-XII 
| composite 
90 | 45.13 54.00 61.72 71.17 80.20 66.27 
80 39.43 45.71 54.73 59.54 70.28 57.11 
70 34.13 | 42.50 51.10 55.08 62.57 51.70 
60 | 28.00 39.45 46.36 51.65 58.00 46.52 
50 | 24.17 34.17 40.72 47 .86 54.50 41.77 
40 | 20.33 29 50 37 .20 43.62 49 25 36.93 
30 | 13.38 24.50 31.75 37.80 44.70 30.49 
20 | #«27.90 19.23 26.72 | 30.11 39 22 25.25 
10 | 4.32 15.39 17.60 | 20.60 31.80 15.56 


| 

















TABLE V.—DeEcILEs ON Form B or THE AvupiTorRy TEsT FoR AGE GrRouPS 


RANGING FROM 9 TO 14 YEARS 





























N = 556 
Ages 
Deciles -_- 

IX x XI XII xm «(| su 
composite 

90 48.88 52.92 68 .42 74.75 83.19 72.79 

80 41.20 47 .00 61.24 65.25 75.86 63.52 

70 =| ~ 30.75 42.50 56.03 61.00 70.15 56.89 

60 | 27.25 38 .00 50 .47 56.93 66.22 50.52 

50 | 24.17 | 33.25 45.44 51.66 60.84 45.28 

40 20.33 | 28 .33 41.34 47 .20 56.08 | 39.85 

30 15.50 | 23.75 37.55 42 .33 51.23 34.51 

20 9.08 20.53 32.11 36.73 44.80 | 26.09 

10 6.21 14.58 21.19 24.40 34.93 | 17.29 








In interpretation of very low test results teachers should use 
caution (and most teachers know which children have serious auditory 








504 The Journal of Educational Psychology 


impairments) that a child is not listed as a “mental defective” only 
to find out later that he is deaf. Thus, children who have serious 
auditory defects must be known as such before the test is given, 
Among the seven hundred forty children used in the standardization, 
only several pupils were pointed out by teachers as individuals lacking 
sufficient auditory sense to follow the tests satisfactorily, and these 
results were not included in determining norms. 


CONCLUSIONS 


(1) The Auditory Tests should be used to supplement other types 
of intelligence tests when this complex function is being measured. 

(2) If cost of measurement is a major consideration, the Auditory 
Tests may be used for the classification of thousands of children 
without the purchase of printed forms. 

(3) Functions considered extremely important in everyday life, 
both inside and outside the school, 7.e., ability to understand and 
reason by means of the spoken word and the capacity to listen keenly, 
are measured by these tests. 

(4) The stimulation to work to capacity, which motivation the 
Auditory Tests give, may result in an accurate rating of an individual’s 
full potentialities, attention being held at a high level throughout the 
tests by means of the spoken word. 

(5) Reading deficiencies will not affect the ratings on these tests. 

(6) Speed of reaction is not emphasized, as is done in many group 
intelligence tests, since all children are permitted to attempt every 
item in the Auditory Tests. 

(7) The reliability and validity of these measures seem entirely 
satisfactory. 

(8) Variations in administration of these tests are eliminated 
since all instructions and items are presented by the phonograph. 
Uniformity is assured whenever and wherever they are administered. 


BIBLIOGRAPHY 


(1) Chapman, J. C.: ‘A Group Intelligence Examination without Prepared 
Blanks.” Jr. Edu. Research, Vol. u, December, 1920, pp. 777-786. 

(2) Chapman, J. C.: ‘‘A Group Intelligence Examination without Prepared 
Blanks.” Jr. Edu. Research, Vol. x1, April, 1925, pp. 269-279. 

(3) Dearborn, W. F.: Jntelligence Tests, 1928, pp. 202--203. 

(4) Freeman, F. S.: ‘‘ Power and Speed: Their Influence upon Intelligence Test 
Scores.” Jr. Applied Psy., Vol. x11, December, 1928, pp. 631-635. 








IS 


7 


Co US 





The Stump Auditory Group Tests of Intelligence 505 


(5) Freeman, F. 8.: ‘‘The Factor of Speed and Power in Tests of Intelligence.” 
Jr. Experimental Edu., Vol. x1v, February, 1931, pp. 83-90. 

(6) Freeman, F. 8.: “On the Improper Use of Psychological Tests.” School and 
Society, May, 1933, pp. 653-654. 

(7) Freeman, F. 8.: Individual Differences, 1934, pp. 298-299. 

(8) Hahn, H. H.: ‘‘A Criticism of Tests Requiring Alternative Responses.”’ 
Jr. Edu. Research, Vol. v1, October, 1922, pp. 236-240. 

(9) Rankin, P. T.: “Listening Ability.” Ohio State University Bulletin, Vol. 
xxxIv, September, 1929, pp. 172-183. 

(10) Ruch, G. W. and Foster, R. R.: “On Correction for Chance in Multiple- 
response Tests.” Jr. Edu. Psy., Vol. xvi, January, 1927, pp. 48-51. 

(11) Terman, Lewis M.: Stanford Revision and Extension of the Binet-Simon Scale 
for Measuring Intelligence, 1917, p. 129. 





A CLASSROOM APPROACH TO THE IMPROVEMENT 
OF READING RATE OF COLLEGE STUDENTS 


HARRY GOLDSTEIN AND JOSEPH JUSTMAN 


School of Education, College of the City of New York 


The improvement of reading, originally considered an elementary- 
school function, is now recognized as one of the important problems 
of the secondary school and college as well. For the most part, the 
college has attempted to approach improvement in terms of a clinical 
study of the individual. The rdle of the classroom teacher within the 
remedial program has been almost completely neglected. It seems 
obvious that the problem is so acute that more and more individual 
teachers must take it upon themselves to help the student in their 
classes. 

The two experiments reported in this paper represent the attempt 
of a college teacher to use his classroom to develop a program designed 
to improve ability in one of the many aspects of reading—reading rate. 





PREVIOUS RESEARCH 


Perhaps the earliest experiment on the improvement of reading 
rate at the college level was conducted by Stone and Colvin more than 
twenty years ago.! Forty-five students in a class in educational 
psychology, as part of a unit on “How to Study,” met for a total of 
thirty-five hours, thirty of which were spent in practice in reading 
textbooks in the field, the remainder in reading the social science peri- 
odical Outlook. Instruction was given in techniques of increasing 
reading speed and comprehension. Actual improvement was meas- 
ured in two ways: By a test in reading material in the field of educa- 
tional psychology, and by measurement before and after practice by 
means of the Monroe Silent Reading Test with Stone’s Extensions. 
On the test in educational psychology the class as a whole increased 
its score in rate of reading by more than one-half. Scores on the 
standardized test indicated a seventy-four per cent gain. This gain 
in speed of reading was more than three times that of a control class. 
The technique employed here was also used by the senior author with 
three other classes.2, He reported average gains in rate from thirty- 
five to one hundred eight per cent. 

Averill and Meuller* undertook a similar study in which sixteen 
women met for three forty-minute sessions a week, for a period of 

506 








18 
le 
al 
e 
is 


1 





Improvement of Reading Rate of College Students 507 


twelve weeks. Initial and final speed of reading were tested by 
determining the average number of words read in two two-minute 
intervals. ‘The test material consisted of selections from Emerson’s 
Essay on Eloquence. In the practice sessions, the subjects read from 
standard works of literature for a period of ten minutes, and then 
attempted to reproduce the material read. Throughout the entire 
experiment, the factors influencing the development of silent reading 
were emphasized, and each subject was urged to keep them in mind 
during the practice period. The authors reported a change in reading 
rate from two hundred fifty-two to five hundred four words per minute, 
a gain of ninety-nine per cent. Comprehension was not appreciably 
affected. 

Another early study, conducted by Remmers and Stalnaker,‘ 
dealt with a group of seven students who met for a total of thirteen 
sessions, varying in time from ten to forty minutes. Instruction was 
given concerning the various factors influencing reading rate and 
comprehension. The concept of improvability was emphasized 
during these instruction periods. The students were tested each 
session on the number of lines of Crawford’s The Technique of Study 
they could read during the session. The amount of time spent in this 
reading was increased during the course of the experiment, thus 
accounting for the variable time limits of each session indicated above. 
The average number of lines read per minute increased from 20.7 to 
25.8. A gain of twenty percentage points in comprehension was 
noted. 

More recent studies generally show more careful control of experi- 
mental technique. Lauer,’ for instance, gave his group of three 
hundred fifty-five students a six-page mimeographed form on improve- 
ment of reading. This form consisted essentially of the following 
parts: (a) A preliminary discussion of the possibility of reading 
improvement, (b) a description of types of reading, (c) a method of 
calculating the number of words to a page and of timing the reading 
period, (d) some precautions as to sources of error in experimental 
techniques, (e) fifteen general principles for improving reading, and 
(f) a form for indicating results with instructions for making calcula- 
tions and tabulating the data. Each student spent twenty sessions 
in practice, using regular assignments in two or three courses as 
practice material. Five or ten minutes of each practice period were 
set aside for testing. Improvement was measured by determining the 
difference between the mean of the first three trials and the mean of 





508 The Journal of Educational Psychology 


the final three trials. The students increased their reading speed 
from 248 to 325.5 words per minute, a thirty five and three-tenth 
per cent gain. Those who read more rapidly at the beginning generally 
showed the greater improvement. 

Bear,® working with a group of one hundred twenty seven students 
at Dartmouth, introduced several new procedures. His subjects were 
divided into small groups of from ten to thirty students, and met 
twice a week in one-hour sessions, for a period of five or six weeks. In 
addition to the usual lectures on factors which influence reading speed 
and comprehension, and on techniques in improving reading, Pressey’s 
Manual of Reading Exercises for Freshmen was used as practice mate- 
rial. Moreover, a projection lantern was utilized to flash words and 
sentences on a screen in order to increase eye-span. The Metro- 
noscope was used in order to develop rhythmic progression of eye 
movements. The author reported a gain of 37.7 words per minute 
in rate of reading, and an increase of eleven points in comprehension 
score as measured by the Iowa Reading Test. The initial status of 
the group was not indicated. 

Weber’ contrasted two techniques of increasing reading rate. One 
group of twenty-one students met once a week for periods of thirty 
to forty-five minutes. During the six weeks of the experiment, six 
hundred cards were flashed on a screen by means of a tachistoscope. 
Exposures varied from one-twentieth to one full second. ‘The mate- 
rial used ranged from nonsense materials to complex sentences. A 
second group of twenty-five students used Pressey’s Manual of Reading 
Exercises for Freshmen for the same length of time. Both groups were 
combined for initial and final testing. Special time limits were set 
for the Iowa Silent Reading Test. The groups showed a gain of 
34.1% in speed of reading. There was no apparent difference between 
the two methods. 

Simpson’ introduced a variation in procedure. In addition to 
meeting as a group once a week for a period of nine weeks, her subjects 
were also called in for individual conferences, each subject meeting 
with the experimenter at least once and some as many as six times. 
Group activities consisted of the usual lecture and discussion of the 
importance of reading, techniques of improving reading, etc. In 
addition, Strang’s Study Type of Reading Exercises was used. One 
to four chapters were read in each period, depending upon the time 
left from other activities. Some of Pressey’s Manual of Reading 
Exercises for Freshmen were also used. The median rate of reading 








_— aT wwe Ft 





Improvement of Reading Rate of College Students 509 


increased from three hundred fifty-five to seven hundred forty words 
per minute, with no apparent loss in comprehension. 

The use of films as a device for improving speed of reading was 
reported by Dearborn and Wilking® in a series of three experiments 
conducted at Harvard. In the first experiment, a group of sixteen 
students received class instruction in reading techniques for a period 
of eight weeks. Motion pictures previously developed by Dearborn 
and Anderson” were used to teach phrasing and to increase the reading 
span. Practice in reading phrases of gradually increasing length 
was also given. The median rate of the group increased from two 
hundred fifty-one to three hundred eighty-two words per minute. 

In the second experiment, sixty-six students met for one hour a day, 
five days a week, for a total of four weeks. The experimental pro- 
cedure used leaned heavily on the presentation of reading films. 
Additional work was given on both speed and comprehension by 
utilizing textbook and literary materials. The median rate of this 
second group increased from two hundred forty-eight to three hundred 
thirty-three words per minute. 

In the third experiment, a group of thirty students met three times 
a week for six weeks, spending fifty minutes in each session. In 
addition to the films, which were run at a speed varying from one 
hundred fifty to four hundred words a minute, a corrective reading 
manual was used. The manual was divided into three subject-matter 
divisions—history, English, and science. These in turn were divided 
into textbook material and primary source material, similar to that 
which the college student is expected to read. The manual was read 
for speed, with no check upon comprehension. This group increased 
in rate of reading from 213.04 to 316.7 words per minute. 

As this brief consideration of the work in the field indicates, most 
recent studies used a more or less clinical approach to the problem of 
increasing speed of reading. Films, tachistoscopes, and other devices 
for improving reading techniques were utilized. Moreover, none 
of the recent studies was conducted in an actual class situation. 
Students chosen for experimental work met after the usual school 
day, spending as much as twenty hours (in one study reported above) 
in practice sessions. The investment in terms of materials, apparatus, 
and time is quite beyond the range of feasibility for most college 
teachers. 

The two experiments reported here represent an attempt to deter- 
mine whether comparable results can be achieved in a typical class 





510 The Journal of Educational Psychology 


situation, where the necessity of covering a certain amount of course 
content makes extended experiments impossible, and where expensive 
apparatus is not available. 


THE FIRST EXPERIMENT 


Fourteen students enrolled in a class in educational psychology 
at the College of the City of New York constituted the subjects in 
the first experiment. The topic of improving reading techniques was 
introduced during a discussion of study habits. Since most of the 
students indicated dissatisfaction with their speed of reading, it was 
decided to undertake a training program designed to increase read- 
ing rate. While no extended discussion of improving speed was 
attempted, the following points were made: (a) Most students can 
improve in reading even though they have already reached a relatively 
high degree of speed. (b) Pointing, head movements, whispering, 
and lip movement are unnecessary and tend to slow up reading. 
Speed can be improved if these useless motions are discarded. (c) 
Rapid reading is the result of few and short fixations. Long stops 
on individual words should be eliminated. The student should try 
to see phrases, even entire sentences at a glance. (d) Reading should 
be used not merely as a passive attempt to absorb the ideas of the 
author, but rather as an attempt to anticipate what the author is 
going to say. (e) The student should attempt to read at a “‘faster- 
than-convenient”’ rate. 

Strang’s Study Type of Reading Exercises'! was used for practice 
material. The Strang exercises consist of twenty simply written 
passages of one thousand words, similar in form, each of which deals 
with the theory of reading and study. The author indicates that the 
level of difficulty, in terms of vocabulary and sentence structure, is 
definitely suited to the majority of high-school students. 

Ten minutes at the beginning of the next fourteen class meetings 
were set aside for experimental work. Each student read one passage 
in the booklet as rapidly as he could. Passages were systematically 
rotated among the students thus: Student A began with Exercise 1, 
Student B with Exercise 2, etc. Time was indicated on the black- 
board by the instructor every five seconds. Each student was thus 
able, by means of a simple computation, to determine his reading rate 
in words per minute and keep an individual chart of progress. 

Each exercise in the Strang booklet is followed by a series of ques- 
tions designed to check comprehension. The first ten exercises are 








‘se 
ve 





Improvement of Reading Rate of College Students 511 


followed by three multiple-choice questions; the second ten by three 
completion questions. In order to arrive at a quantitative measure 
of comprehension, each multiple-choice question was arbitrarily 
assigned a value of five points. The same value was placed upon the 
completion questions. In the case of the latter, partial credit was 
awarded by the instructor, who marked all answers. 

The results are given in Table I,* 

If the scores made on the first trial are eliminated because of the 
subjects’ necessity of becoming familiar with the experimental situa- 
tion, the results indicate an average increase (Trial 2 to Trial 14) 
from 382.7 to 661.4 words per minute, a gain of 72.8 per cent. The 
group made a slight gain in comprehension. 

It is interesting to note the effect that knowledge of the type of 
comprehension questions has in influencing reading speed. The 
students were aware of the fact that the first ten passages were tested 
by means of multiple-choice questions and the last ten passages by 
means of completion questions. The underlined ‘‘steps” in Table I 
indicate changes from one type of passage to the other; the upper 
series of steps referring to changes from the multiple-choice to the 
completion type, the lower series to changes from the completion to 
the multiple-choice type. In both instances, the change from one 
type to another resulted in increasing the time taken to read the 
passage. The average increase from multiple-choice to completion 
type passages was 16.5 seconds and the average increase from comple- 
tion to multiple-choice type passages was 13.6 seconds. The average 
increase in time taken, including all changes from one type to another 
was 15.3 seconds. This, of course, represents a sizeable difference in 
terms of the number of words read per minute. Apparently, the 
students did not have confidence in their ability to maintain the rate 
which they had already developed when confronted with a different 
type of question designed to test comprehension. 

Several factors, operating independently or jointly, might account 
for the large gain noted in reading rate. The increase might be due 
to: (a) The daily testing, which provided a speeded reading situation 
on material one thousand words in length; (b) the knowledge gained 
from mastering the content of the Strang exercises, which dealt with 
reading techniques; or (c) outside practice in rapid reading. With 





* .. table of reading speeds, giving the reading rates for passages from nine 
hundred to twelve hundred words in length within the time limits of sixty to four 
hundred seconds, can be secured by request from Dr. Goldstein. 































































































512 The Journal of Educational Psychology 
TaBLeE I.—RaTE AND COMPREHENSION Scores ACHIEVED UsinG Strang 
EXERCISES 
Trials 1] 2 | 3/4/5]6/{/7 {8 ] 9 | 10] 1] 12] 13] 
Subject Time score (in seconds) 
A 180 |135 [145 |150 |130 | 60 | 90 |105 | 75 | 90 | 95 | 75 |110 | 85 
B 220 |160 |170 |140 {150 | 80 |150 [110 [115 140 {115 105 |105 | 95 
Cc 165 |160 (135 |145 /125 | 90 | 85 | 90 160 | 75 90 |100 | 60 | 55 
D 135 |130 [115 |105 | 90 | 60 | 80 ‘90 | 80 80 90 | 80 90 
E 120 |110 [130 {115 |110 | 55 ‘30 | 70 70 55 | 60 | 50 
B 155/155 |145 |130 /|115 105 [110 110 | 95 | 95 | 95 | 90 r 
G 175 |165 |145 |140 [135 |130 (105 [105 | 95 | 95 | 90 [115 | 85 90 
H 150 |110 | 95 130 {110 65 | 65 | 80 | 75 | 85 | 80 | 8 | 70 | 65 
I 170 |130 130 {125 {125 100 | 95 | 95 | 85 | 95 | 75 | 80 95 80 
J 185 195 {140 [125 135 | 55 | 55 | 75 | 70 | 65 | 70 70 | 65 65 
K 295 215 lots 210 {220 |130 |175 |165 |165 |150 165 {160 [175 '160 
L 225 (220 [245 |205 (185 | 90 |160 |130 |165 160 [150 145 145 |150 
M 190 [165 |160 |160 |150 | 90 | 70 | 90 95 | 90 90 | 95 |110 |105 
170 |145 |155 |140 |150 | 20 | 40 110 {115 {115 115/110 110 
Average time............. .|176.1|156.8]151.8/144.3/137.9| 78 9 96. 8/101 3108 1/100.4 101.4) 98 9 97.1) 90. 
Average rate (words per | a Za 
DN i avsinciceceeds 370 7)382.7 308.3 415.9|435.3/761. 1/620. 0/589.5)555.2/597.9|591 9608 6 617.7/¢61.4 
Average comprehension | 
Rs is censrceneescact 12.1] 12.7) 13.0] 12.4] 13 / 11.8} 13.1] 14 q 14.2) 13 ‘| 14 ' 13.6] 12 | 14.1 


























regard to the last factor, only two or three students reported any 
outside practice whatever, and then a negligible amount. 


impossible, of course, to disentangle the other two factors. 


It is 
The 


second experiment attempts to eliminate this difficulty by using 
practice material which does not introduce the possibility of improve- 
ment through mastery of content. 

A further limitation of the use of the Strang exercises for research 
purposes lies in the paucity of the number of questions used to test 











Improvement of Reading Rate of College Students 513 


comprehension. ‘Three multiple-choice or three completion questions 
do not represent an adequate challenge to the average college student. 
The second experiment makes use of ten true-false questions as a 
measure of comprehension. 

One additional point is worthy of note. The correlation between 
initial rate of reading and improvement in rate is .20, indicating a 
slight tendency for better readers to make the greater gain, even at 
the high level at which this group as a whole begins practice. Care 
must be taken, however, in determining the basis upon which such 
correlations are computed. If, for instance, initial time in seconds is 
correlated with improvement in time, the result is a coefficient of 
—.52, indicating that initially slower readers tend to reduce their 
time more than initially rapid readers. This apparent contradiction, 
of course, is due to the fact that while those subjects who take a longer 
time to read the material originally reduce their time markedly, the 
gain in rate is not directly proportional to the decrease in time. Thus, 
a decrease from sixty-five to sixty seconds in the time taken to read 
one thousand words represents an increase of approximately seventy- 
five words per minute in rate of reading, while a decrease from one 
hundred twenty-five to one hundred twenty seconds represents an 
increase of only twenty words per minute. Improvement must be 
measured directly in terms of words per minute. 


THE SECOND EXPERIMENT 


A larger group of thirty students, also enrolled in a class in educa- 
tional psychology, were the subjects in the second experiment. The 
same procedure was used in introducing the experiment, and the same 
instructions were given to the group. However, several changes in 
technique were made. 

Instead of utilizing the Strange exercises, each student was asked to 
bring to class a passage of approximately one thousand words, chosen 
from magazine articles, newspaper editorials or features, etc. Ten 
true-false questions were to be prepared on the passage chosen. The 
material was arranged on eight by eleven sheets, page 1 containing 
the identification number and the number of words of the passage; 
pages 2 and 3, the article; page 4, the true-false questions; and page 5, 
the key for scoring the questions. Keys were checked for ambiguity 
and correctness by the instructor. 

There are several advantages gained by the use of this procedure: 
(a) Since the Strang exercises utilize only one type of reading material, 





514 The Journal of Educational Psychology 


the possibility of carry-over to other types of reading is reduced. The 
use of a wide variety of materials drawn directly from the outside 
reading of the students increases the possibility of such transfer. 
(b) The active participation of the student in the development of the 
material used in the experiment makes for greater motivation. (¢) 
The possibility of improvement resulting from the content of the 
selections used is eliminated, since none of the materials drawn up by 
the students dealt, as did the Strang exercises, with techniques of 
improving reading. Such improvement as is found must be attributed 
to the interspersed testing. (d) A more accurate check upon com- 
prehension is provided. 

The procedure described in the first experiment was followed in 
this second experiment. The results are given in Table II. 


TABLE II.—RATE AND COMPREHENSION SCORES ACHIEVED UsING MISCELLANEOUS 
READING MATERIALS 





Trials 1 2 3 4 5 6 7 8 9 10 11 12 13 14 





Average 
time (in 
seconds)...| 219.3) 201.0) 200.0) 186.7| 175.9) 172.3) 140.5) 152.6) 136.9|141.2) 144.3) 129.0 128.1) 123.3 








Average 
number of 
words read|1045.7/1067.5/1045.6/)1055. 6/1045. 1/1059. 1/1058. 0)1059.0/1054.5/900.7)| 1056. 5/1051.2/| 1050.2) 1097.6 





Averate rate 
(words per 
minute)...| 286.1) 318.7) 313.7] 339.3) 356.6! 368.8) 451.8) 416.4) 462.2/382.9] 439.4) 488.8 491.9) 533.9 





Average com- 
prehension | 
score...... 8.4, 8.4) 8.8} 8.9) 8.3} 8.7) 8.9) 8.6) 9.4) 9.0) 8.6) 8.4) 86 87 















































Again considering improvement as the change from Trial 2 to 
Trial 14, the results indicate an increase from 318.7 to 533.9 words per 
minute, a gain of 67.5 per cent. The group again made a slight gain 
in comprehension. The correlation between initial rate of reading 
and improvement in rate was approximately .28, indicating once more a 
slight tendency for the initially better readers to make the greater 
gain. 

There seems to be little difference between the use of either the 
Strang exercises or the more informal material developed by the 








1e 
le 
we 
1e 


1€ 
Vv 





Improvement of Reading Rate of College Students 515 


students. While the students using the Strang booklets made a 
slightly greater gain, it must be remembered that their initial level of 
ability was much higher, and greater improvement is to be expected. 
From the point of view of the teacher of educational psychology, 
the technique used in these two experiments has much to recommend 
it, apart from the gain in rate of reading, which may simply be a tem- 
porary result due largely to the motivational aspects of the testing 
situation. In addition to introducing students to experimental 
technique, the data and results obtained may be used to illustrate 
such common topics in educational psychology as learning curves, 
individual differences, and elementary statistical procedures. 


CONCLUSIONS 


(1) Reading rate of college students may be markedly improved 
in a typical class situation through a program of interspersed testing, 
without the use of expensive apparatus or extended experimentation. 

(2) A change in the method used to test comprehension, if expected 
by the student, reduces reading speed. 

(3) Initially better readers tend to make the greater improvement 
in rate of reading. 

(4) Correlations between initial status and improvement should be 
made in terms of rate of reading, as measured in terms of words per 
minute, rather than in terms of reduction in time taken. 


BIBLIOGRAPHY 


1. Stone, C. W. and Colvin, C.: “ ‘How to Study’ as a source of motive in edu- 
cational psychology.” Journal of Educational Psychology, Sept., 1920, 
Vol. x1, pp. 348-354. 

2. Stone, C. W.: “Improving the reading ability of college students. 
Educational Method, Sept., 1922, Vol. u, pp. 8-23. 

3. Averill, L. A. and Mueller, A. D.: ‘The effect of practice on the improvement 
of silent reading in adults.”’ Journal of Educational Research, Feb., 1928, 
Vol. xvur, 125-129. 

4. Remmers, H. H. and Stalnaker, J. M.: ‘‘An experiment in remedial reading 
exercises at the college level.’’ School and Society, Dec. 22, 1928, Vol. 
XXVIII, pp. 797-800. 

Lauer, A. R.: ‘‘An experimental study of the improvement of reading by 
college students.”” Journal of Educational Psychology, Dec., 1936, pp. 
655-662. 

6. Bear, R. M.: ‘‘The Dartmouth program for diagnostic and remedial reading 

with special reference to visual factors.”” Educational Records Supplement, 
Jan., 1939, No. 12, pp. 69-88. 


” 


Journal of 


qo 





516 The Journal of Educational Psychology 


7. Weber, C. O.: “The acquisition and retention of reading skills by college 
freshmen.” Journal of Educational Psychology, Sept., 1939, Vol. xxx, 
pp. 453-460. 

8. Simpson, R. H.: ‘“‘ Improving reading and related study skills of college 
women.” College English, Jan., 1940, Vol. 1, pp. 322-332. 

9. Dearborn, W. F. and Wilking, S. V.: “Improving the reading of college 
freshmen.” School Review, Vol. xurx, Nov. 1941, pp. 668-678. 

10. Dearborn, W. F. and Anderson, I. H.: ‘‘A new method for teaching phrasing 
and for increasing the size of reading fixations.” Psychological Record, 
Dec., 1937, Vol. 1, pp. 459-475. 

11. Strang, R.: Study Type of Reading Exercises. Teachers College (Columbia 
University), Bureau of Publications, 1935. 








llege 


llege 
llege 


Sing 
ord, 


nbia 





TEST RELIABILITY FOR WHAT? 
BENJAMIN S. BLOOM 


Board of Examinations, University of Chicago 


Reliability has been defined as the consistency with which a test 
measures what it measures. A reliability coefficient does not tell us 
what the test measures. Without further analysis one can not decide 
what specific degree of consistency is desirable or necessary for the 
purpose to which the test results are to be put. 

Kelley! has pointed out that when a test is to be used for group 
measurement purposes a reliability coefficient of .50 or higher is needed. 
When the test is to be used for individual measurement purposes a 
reliability coefficient of .94 or higher is needed. No one would ques- 
tion the desirability of high reliability coefficients at all times for 
educational measurements, but often we are confronted with the 
problem of what to do with evidence about students based on tests 
with reliability coefficients which do not satisfy the standards indicated 
above. This paper is an attempt to point out some methods whereby 
tests with low reliability coefficients can be utilized for making generali- 
zations about student behavior. 

It must be pointed out that tests may be utilized for such varied 
purposes as: The prediction of scholastic achievement, diagnosis of 
individual strengths and weaknesses, assignment of grades to students, 
selection of individuals for certain educational and vocational oppor- 
tunities, and the surveying of individual and group status. The pre- 
cision required of the test should in general correspond to the extent 
to which the test results will be used to reach decisions vitally affecting 
the student. Also, the number of categories into which students are 
to be distinguished on a distribution of test scores should serve to 
indicate the lowest reliability coefficient which will be acceptable. 

Preoccupation with precision in educational measurements has 
resulted in an emphasis on long and reliable tests for all purposes. 
When the length of a test is increased its reliability isimproved. Each 
item in the test which is postively intercorrelated with the other items 
will improve the reliability of the test. Since positive correlations 
in educational tests and measurements are the rule rather than the 
exception, the summing of a large number of not uncorrelated items or 
subtests having low intercorrelations yields high reliability coefficients. 

Emphasis is placed on the total test score rather than on the many 
subtests scores which are contained within the total test. We tend 


517 





518 The Journal of Educational Psychology 


to ignore the way in which the student approaches the problem, the 
types of questions which he answers correctly, and the types of ques- 
tions which he answers incorrectly. Adding together many relatively 
unrelated sub-abilities yields reliable total scores but may obscure 
much valuable information about the configuration of abilities and 
skills which an individual possesses. These abilities may not be 
additive, but perhaps should be combined in some unique fashion if 
we are to secure meaningful insights into the student’s behavior. 

To use a physical analogy, the volume of a box expressed in cubic 
inches or feet gives an adequate concept of the size of the box, but 
tells nothing about its shape. A specified volume might be secured 
by an infinite number of differently shaped boxes. We must know 
the length, width, and depth of the box before its shape is known. 
Also, the volume represents a unique combination of these dimensions 
of the box. Another combination of these same dimensions would 
yield the surface area of the box. However, a knowledge of these 
dimensions tells nothing about the color of the box, its weight, strength, 
cost, age, length of time required to make the box, whether it is air 
tight, etc. 

This analogy can be applied to the field of testing. A total reading 
score is frequently a mixture of many different elements of reading. 
Such a total score is not a direct index of the individual’s reading rate, 
his ability to comprehend material taken from various subject fields, 
his ability to comprehend groups of words in various organizations, 
physiological handicaps, eye movements, etc. Nor can we add scores 
for each of the above characteristics in any meaningful way. We 
must keep them separate and make our judgment about the appropriate 
remedial procedures by considering all the scores in some unique com- 
bination. For certain purposes an index composed of many different 
scores may be useful, but it must be remembered that this statistically 
useful index is a non-meaningful combination. 

Before we construct or use a test, we must first determine why we 
want evidence about the students’ aptitudes and achievements, and 
what evidence should be gathered. Then, having gathered the desired 
evidence, one should combine the various bits of evidence in some 
meaningful fashion. Simply to add various subscores to secure a total 
score may be not only meaningless but actually misleading. To secure 
a reliable test merely by adding different tests together may be a very 
vicious practice which gives a false sense of precision and scientific 
accuracy in tests and measurements. Users of tests sometimes con- 











Test Reliability for What? 519 


fuse high reliability coefficients with scientific accuracy in measure- 
ment, and too often neglect a consideration of the extent to which a 
single score is an index of the specific ability or skill in which they are 
interested. 

However, as the total test is broken into smaller and smaller useful 
units, the reliability of each subtest score decreases. Thus we are 
faced with the dilemma of increasing the diagnostic value of a test 
by decreasing the reliability of the test units under consideration, or 
increasing the reliability of the test by decreasing its diagnostic value. 

Diagnostic testing is an important adjunct to teaching and educa- 
tional guidance. The teacher in order to do an effective job must first 
determine the particular aptitudes and achievement of students which 
are desired, measure progress in these aptitudes and achievements, 
and then, before the students leave the class, measure their final status 
in these aptitudes and achievements. Through such practices the 
instructor will best be able: Firstly, to help each student; secondly, 
to present his subject-matter in the most economical and efficient 
manner; and, finally, to revise the presentation of a particular course 
in the light of changes shown by students from the initital to the final 
test. 

To do this it will be necessary to use many tests, some of which 
will not be as long or as reliable as might be desired. In lieu of other 
evidence about the students and because the testing time is limited, 
the teacher must content himself with this material. The following 
will attempt to point out various ways in which these ‘‘unreliable”’ 
tests may be used with various degrees of confidence. 


I. USE OF TEST RESULTS WITHOUT MODIFICATION 


(A) For Developing Likely Hypotheses about the Student.—A short 
unreliable test dealing with some aspect of reading or, perhaps, with 
the types of errors students make in handling certain laboratory 
instruments may reveal an aspect of student behavior which the 
instructor has not taken into consideration. If no other evidence is 
available on these phases of student behavior, the instructor is con- 
fronted with the problem as to whether he should make use of this 
evidence or should dismiss it because of low reliability. 

In a situation of this type, great confidence can not be placed in 
this evidence, and we can not make any decision on this basis which 
would vitally affect the student. But, in the absence of contradictory 
evidence, we may use these test results as the basis for likely hypotheses 





520 The Journal of Educational Psychology 


about the student, taking care to recognize the error of measurement 
associated with the score. These hypotheses should be explored and 
checked in the light of subsequent evidence about the student. 

Although, on subsequent repetitions of comparable tests of this 
specific ability, it may be expected that the student’s score will 
fluctuate considerably, in the absence of further evidence it may be 
expected that the student’s scores will fluctuate around his present 
score. That is, in the absence of other evidence, the present score 
may be assumed to be the most likely score for the student on this 
ability. 

(B) For Making Decisions about the Student.—If the ability under 
consideration is one for which some definite decision must be made 
which will to a considerable extent affect the student, the instructor 
should seek out further evidence. Thus, if two or more partially 
independent scores are available, the instructor can make a judgment 
based on each score and a decision made on the basis of these two 
judgments will be more reliable than a decision made on only one of 
the scores. 

The test user can place increased confidence in his decisions if they 
are based on judgments made on a large number of partially independ- 
ent tests and if the judgments made on one of the tests are substan- 
tiated by the judgments made on the other tests. 

Instead of the reliability resulting from the collection of relatively 
unrelated items to make a single test score, the test user can sub- 
stitute the consistency of judgments from a number of test scores. 


II. STANDARD ERROR OF MEASUREMENT 


The reported reliability of a test is very misleading. Symonds’ 
has pointed out the many ways in which the reliability coefficient 
of a test may be affected. In particular, it is possible to secure high 
reliability coefficients by constructing tests with large numbers of 
items or by administering short tests to a large and extremely varied 
group of subjects. Thus, a short test intended for sixth-grade students 
may be administered to students in all grades from the first to twelfth 
grade, and, because of the heterogeneity of the students represented, 
a high reliability coefficient may be secured. Then again, a test 
intended for one school may be administered to a large number of 
extremely varied schools and the reliability coefficient for the total 
group of schools will be much higher than for a single school. 

As the heterogeneity of the group tested increases, both the stand- 
ard deviation and the reliability coefficient of the test increase. How- 











Test Reliability for What? §21 


ever, the standard error of measurement which is a function of the 
standard deviation of the test scores and of the reliability coefficient 
of the test remains quite stable. A fictitious numerical example of 
this stability is: 


For one school 
Standard deviation............. 10 Reliability coefficient........... .64 


Standard error of measurement o 7/1 — r;; = 1070/1 — .64 = 6 


For several schools 
Standard deviation............. 15 Reliability coefficient........... .84 


Standard error of measurement ¢ 1/1 — ri; = 15/1 — .84 = 6 


In the above example, in spite of large differences in the standard 
deviations and the reliability coefficients, the standard error of meas- 
urement is constant. The standard error of measurement is an index 
of the scatter of the observed scores around the estimated true score. 

Before it is possible to determine whether a specified reliability 
coefficient is high enough to give satisfactory precision, something 
must be known about the heterogeneity of the group on which this 
coefficient was determined. The writer would recommend that the 
number of statistically significant classes into which the test scores 
may be divided be determined by the ratio: 


Range of scores 


This ratio indicates the number of categories which may be obtained 
from this range of test scores so that the chances of a point in one 
category overlapping with the corresponding point in the next category 
is only about one in one thousand. The test user can then determine 
whether this yields a sufficient number of statistically distinct cate- 
gories for the purpose to which he intends to put the results from this 
test. 

If a normal distribution is assumed in which the range of scores 
on a test is six times the standard deviation, it is possible to determine 
the number of distinct categories available for different reliability 
coefficients by the proper substitutions in the above formula. The 
following represents some possible substitutions: 


RELIABILITY NUMBER OF 

COEFFICIENTS CATEGORIES 
40 2.58 
60 3.16 
80 4.47 
90 6.33 


95 8.97 





522 The Journal of Educational Psychology 


III. USE OF EXTREME SCORES 


A profile of test scores is usually analyzed by pointing out the tests 
on which the student is extreme. Thus the scores on an interest 
test are analyzed by pointing out those interests in which the individual 
deviates markedly from the average. 

If a normal distribution is assumed and the raw test scores are 
transmuted to percentile ranks, certain characteristics of the distribu- 
tion are altered. Equal raw score units are being exchanged for rank 
units in which the number of people rather than the size of the score 
is the unit. As the extreme percentile ranks are approached, the 
distance in raw score units from the mean becomes increasingly large. 
Thus the difference between percentile ranks of 50 and 55 in terms 
of raw scores is much less than the raw score difference between per- 
centile ranks of 90 and 95. 

If the standard error of measurement of percentile ranks is com- 
puted, it will be found that the precision with which we can speak 
about a score is a function of the distance of that score from the mean 
and of the reliability of the test. Table A shows the fiducial limits 
associated with specified percentile ranks and specified reliability 
coefficients. If these limits are adopted, such estimates of the true 
score will be right ninety-five per cent of the time. 

If the reliability of a test is .90 and a student’s percentile rank 
is 50, the true percentile rank may be found (ninety-five out of one 
hundred times) within the limits 27 and 73 or a range of forty-six 
percentile points. For the same reliability coefficient, a percentile 
rank of 90 may (ninety-five out of one hundred times) really be some- 
where within the limits 75 and 97 or a range of twenty-two points. 
Contrast this with the limits within which the true rank may be found 
ninety-five out of one hundred times for other ranks and other reli- 
ability coefficients. 

As the ranks become more and more extreme we can have greater 
and greater confidence in a narrower and narrower range of possible 
true ranks. Although greatest confidence can be placed in the most 
extreme rank on a test with the highest reliability, much confidence 
can be placed in extreme ranks even when the reliability of the test 
islow. Thus we can speak with as much confidence about a percentile 
rank of 98 when the reliability coefficient is .60, as we can speak about 
a percentile rank of 85 when the reliability coefficient is .95. This 
holds true for ranks at the lower extremity as well as for those at the 
higher extremity. 











Test Reliability for What? 


523 


TaBLE A.—GIVEN a SpEcIFIED RELIABILITY COEFFICIENT AND SPECIFIED 
PERCENTILE ScorE Basep ON A NoRMAL DISTRIBUTION, THE FIDUCIAL 
Limits Wuicn May Be Expectep NINETY-FIVE OUT OF ONE HUNDRED 
TIMES 





Reliability coefficients 
































Percentile 
score 
40 50 60 70 80 85 90 95 
99 999+ | 999+ | 999+ | 999+ | 999 | 999 | 998 | 997 
791 827 | 861 895 926 | 942 | 956 | 971 
98 999+ | 999+ | 999+ | 999 998 | 998 | 996 | 994 
704 | 748 | 792 | 837 881 902 | 924 | 947 
97 999+ | 999 | 999 | 998 997 | 996 | 994 | 990 
642 | 690 | 739 | 790 842 | 867 | 869 | 896 
96 999+ | 999 | 999 | 998 996 | 994 | 991 986 
592 | 642 | 695 | 751 | 809 | 839 | 871 905 
} 
95 999 | 999 | 998 | 997 | 994 | 992 | 988 | 981 
550 | 602 | 657 | 716 779 | 812 | 847 | 886 
90 997 | 996 | 994 | 991 | 985 | 979 | 971 | 957 
407 | 460 | 517 582 | 657 | 699 | 746 | 801 
85 | 995 | 992 989 | 983 | 972 964 951 930 
315 | 363 | 420 485 564 609 662 725 
| } 
| | | 
so | 991 | 987 | 981 | 972 | 957 | 945 | 928 | 900 
249 | 293 | 345 | 408 | 486 | 533 | 588 | 657 
} 
70 980 | 972 | 961 | 945 | 919 | 900 | 874 832 
160 | 195 | 237 | 291 | 362 | 407 | 462 | 534 
| | | 
60 962 | 949 | 932 | 908 871 844 | 809 | 755 
103 129 | 162 | 206 267 | 307 | 357 | 427 
i } 
50 | 936 | 917 | 892 | 859 810 | 776 732 669 
065 | 083 | 108 —§ 142 190 | 224 | 268 | 331 




















Nore: The complements of the above figures may be used for percentile scores 


below 50. 


Thus, short and unreliable tests can be utilized for quite precise 
generalizations about individuals with extreme ranks, but only for 
the formulation of venturesome hypotheses about individuals who 
receive none-extreme ranks. 





524 The Journal of Educational Psychology 


IV. USE OF A COMBINATION OF PROBABILITIES 


If an individual student has taken a number of tests and a profile 
is drawn to represent his scores, we might want to know whether it is 
possible that he might really have received the mean score on these 
tests. That is, we would like to know the probability that this 
particular profile of abilities and disabilities is due to the errors asso- 
ciated with the tests, rather than to the presence of true abilities or 
disabilities. 

The standard error of measurement can be used to determine the 
probability that one score is different from another in the same dis- 
tribution. The difference between two scores may be divided by 
the standard error of measurement and the ratio referred to the X /¢ 
column of a normal probability integral. Thus, if Individual A’s score 
is 38, the mean in this case is 30, and the standard error of measure- 
ment is 3, then a = 2.67. The probability of these two scores 
being different is 9924 out of 10,000. It is reasonable to suppose that 
these two scores are statistically different. It is also possible to 
determine the probability that on the other tests taken by this indi- 
vidual his score is different from the mean by following the above 
procedure. 

Having determined the probablity that each of the scores received 
by this individual is different from the mean, it is now desired to deter- 
mine the probability that this combination of scores is different from a 
combination of mean scores on these tests. Lindquist* has pointed 
out a method whereby the probabilities from independent tests of 
significance can be combined. This is done by converting each prob- 
ability to a chi-square value and then determining the probability 
of getting the sum of these chi-squares by chance. 

A more direct technique for appraising the significance of a number 
of critical ratios may be determined. If we have 


pln ae ny SO 


01 02 


= Z,+ Z, = CR: + CR, 





then, if the tests are unrelated, 


Cm _ oz," + oz," 





_ V 02; + oz," 


Osum 





le 








Test Reliability for What? 525 


Since oz,? = 1 and oz,? = 1 


Sum = V1 +1 = 72 
Z:+ Ze 
V2 
Thus, to test the significance of a number of critical ratios it is necessary 


to determine their sum and divide by the square root of the number of 
critical ratios in the sum. See the following example: 


then = new critical ratio 





Individual! Critical 
oVi-—rn | Mean A’s score ratio 














Ea ara hg ee ieee 4 15 23 2.00 
ee 6 ie sits ne ohne eeaia rs 3 17 25 2.67 
Test 3.. 5 10 21 2.20 

6.87 











Since a = 3.97, the probability of getting a critical ratio as great 
as this by chance is less than one in one thousand times. 

It is obvious that the comparisons need not always be made from 
the mean. If the test user has found that individuals with a certain 
profile of test scores are ordinarily aided most by certain remedial 
measures or should be given certain types of guidance, the above 
procedure should be useful in determining the probability that the 
scores for a given individual are like or different from the profile of 
scores in question. When the comparison is with another score, the 
difference between the two scores should be divided by +/2 times the 
standard error of measurement. 

This procedure makes possible the testing of a number of hypoth- 
eses about the scores for a particular individual. Since it can be 
applied to a set of scores regardless of the size of the reliability coeffi- 
cients, it should prove useful in the analysis of any type of profile 
where the characteristics in question can be expressed in quantitative 
terms. 

The above method will yield lower probabilities of scores being 
different than is actually the case. Since most achievement tests are 
positively correlated, high scores on one test will quite likely be asso- 





526 The Journal of Educational Psychology 


ciated with high scores on another test. Thus the individual who has 
a certain score on one test will tend to make certain scores on other 
related tests. The method described above was based on the assump- 
tion that the test scores are unrelated; when related measures are used, 
a slightly different procedure should be adopted. 

If the tests are related, the significance of a number of critical 
ratios may be derived. 

If we have 

Sum = =! 4 <2 = Z, + Z, = CR + CR; 


o1 
then 
Cun” = oz," + oz,? + 27 120 2,0 z, 
Voz," + oz,? + 27r 120 2,02; 





Tsum 
Since oz,?7 = 1 andaz,? = 1 
Gum = VI +14 2re = V2 +r) = V2VY1 +3 
If we assume that rie = ri3 = 123, etc., then 


Cum = VN V1 4+ (N= 1)rn 


where N = number of critical ratios. The average correlation 
between the critical ratios can hardly be greater than the average 
reliability coefficients of the tests involved, so this may be used as a 
conservative upper bound of the intercorrelations. 

Thus we may proceed as before to find the critical ratio on each test 
for the individual in question, sum these critical ratios and divide 
by gsm to secure the critical ratio of the sum.* This value may be 
referred to the normal probability integral to determine the probability 
that such a set of critical ratios might be obtained by chance. 











BIBLIOGRAPHY 


1. Kelley, T. L.: Interpretation of Educational Measurements, pp. 210-211. World 


Book Company, 1927. 
2. Symonds, P. M.: “Factors Influencing Test Reliability.” Journal of Educa- 


tional Psychology, Vol. xrx, pp. 73-87. 
3. Lindquist, E. F.: Statistical Analysis in Educational Research, p. 47. Houghton 


Mifflin Company, 1940. 





* The writer is indebted to Dr. John H. Smith of the University of Chicago for 
certain suggestions on the significance of a sum of critical ratios. 








or 
)- 


i! 





THE USE OF FIELD STUDIES IN TEACHING 
EDUCATIONAL PSYCHOLOGY 


KARL P. ZERFOSS 
Professor of Psychology, George Williams College, Chicago, III. 


AND 
HARRIET D. MOORE 
Student in Group Work Education, George Williams College, Chicago, III. 


THE METHOD 


This report relates to the teaching of the first course in Educational 
Psychology in which primarily the psychology of learning is empha- 
sized. In this course the attention of the students is directed to a 
unit of experience wherein the principles and facts of learning are 
focused. Herein these principles are illustrated, both by their ade- 
quate application and the lack of it, within a frame work in which 
the students have a practical interest beyond the limits of the course 
itself. 

For most of the students the experience selected is taken from their 
field work activities which are being carried in some social agency in 
Chicago. The majority of the students in the class are remunerated 
for such services. The point to be stressed is that these experiences 
are real ones so that the meanings involved do not need to be trans- 
ferred later to some actual situation. The students are urged to 
make this project suffice for the class and at the same time provide 
a way of improving the work they must do in the course of their jobs. 

At the beginning of the course instructions are given regarding 
the nature of the project, its purpose and procedures, including the 
relating of content to field experience. Field studies from previous 
years are made available to the students and are referred to at this 
stage when the project is outlined. While the literature is limited as 
to similar projects there are a few to which students are referred for 
aid. Chief among these is Pressey’s Psychology and the New Education, 
especially the section (pp. 344ff.) which describes the way in which a 
group of little girls work a jig-saw puzzle and the analysis of this in 
terms of learning psychology. 

A guide sheet is then presented upon which students describe 
the unit of experience to which they have decided to give their atten- 
tion. In this way the instructor is able to give counsel upon the 


desirability of the selected unit. 
527 





528 The Journal of Educational Psychology 


In this course a text book is used with additional readings required. 
For two years Mursell’s Educational Psychology has proven to suit 
the purposes excellently. As the quarter proceeds the students are 
required to draw out of and list the major principles and facts presented 
in the text. These are discussed in class sessions and illustrated from 
experience. Many of the illustrations are given by the students 
from their units already selected. 

About half-way through the course a number of students present 
their projects before the class and discuss the insights involved. 
About the same time a few small group sessions are provided for 
student discussions of their projects and procedures. 

Before the project is written up in final form students are instructed 
to select from the large number of principles and facts, drawn from 
the text, a smaller number of these which are to be used in their 
papers. Here the procedure calls for a discussion of the experiences 
of the unit in light of these selected facts and principles. 

The instructor’s main job is to lead the students to choose fruitful 
projects, manage the discussion of the text, supplement this with 
lectures upon certain topics, and guide students in the preparation 
of their project descriptions. 

In evaluation of this method nothing very exact has been attempted. 
However, the students do about as well on the Potthoff-Corey Test 
in Educational Psychology as students are expected to do and it 
seems from experience that many of them begin to get fundamental 
insight into the learning process for the first time. Some of the 
students have difficulty in seeing the implications of principles involved 
in their field activities, while others are able to objectify and discuss 
these latent meanings with clarity and skill. Generally the poorer 
students have difficulty with the process. Some of them never seem 
able to do this work satisfactorily. The instructor has deepened 
his own understanding and has widened the breadth of his perspective 
regarding the learning process during these years of working with 
this type of course where theories and principles are being related 
to the activities of real people in actual life situations and where these 
activities are made meaningful in terms of generalizations. 

Included among the topics selected by students during the last 
few years were: “‘The Application of Educational Psychology to the 
Learn-to-Swim Campaign’”’; “An Analysis of Folk Dancing in Terms 
of the Learning Process”’; ‘“‘Scouting and Learning’’; ‘‘A Comparison 
of Learning in a Club and a Class’”’; “‘ Analysis of Sailing Instruction 








2. 


Lit 


Se ot ll 





Field Studies in Teaching Educational Psychology 529 


in a Summer Camp”; ‘‘A Comparison of the Studio and the Class- 
room Method of Teaching Art’’; ‘‘ Principles of Economy, Permanance 
and Transfer as Applied to Teaching Motor Skills’”’; ‘‘Learning in 
Basketball’’; ‘‘ Analysis of Learning Enroute to and from the Job.”’ 

From the papers submitted by students in recent years the one 
which follows illustrates the possibilities of the method just described. 
This paper is one of the more satisfactory reports submitted in this 
connection. It was written by Miss Harriet D. Moore, an under- 
graduate student in Group Work Education at George Williams 
College in Chicago. She is employed, on a part-time basis, at a home 
for delinquent girls in Chicago. The craft class which she analyzes for 
its learning values was initiated as a part of the regular program of 
the institution. This course in educational psychology happened 
to come just as the craft class began and Miss Moore selected the latter 
as the field experience upon which she would bring to bear insights 
from the psychology course. The paper is just as it was submitted 
by the student-author. It has been read and approved by the Director 
of the Institution involved. 





ANALYSIS OF A CRAFT CLASS IN TERMS OF CERTAIN PRINCIPLES AND 
FACTS FROM EDUCATIONAL PSYCHOLOGY 


Introduction.—The class is composed of delinquent girls between 
the ages of fourteen and eighteen who have been committed to the 
agency by juvenile courts. Commitments vary from twelve to 
eighteen months. The personality and conduct problems of the 
individuals in this group are serious in character and many in number. 
The girls come from homes in which training was inadequate, where 
low standards of living were extremely common, and where depriva- 
tion was the rule rather than the exception. 

Most of the girls are overtly aggressive in their behavior in order 
to cover up feelings of inferiority and insecurity. Some of them are 
boisterous, quick-tempered, selfish, uncodperative, and destructive, 
while others are of a withdrawing, timid nature. A few are of superior 
intelligence, but the greater number are of low-average intelligence 
and several are mentally retarded. 

The number in the group varies constantly, depending on the 
release and intake fluctuations of the agency. There is rarely a group 
smaller than fifteen and occasionally the membership may run as high as 
thirty-two. The class meets three afternoons and one evening each 





530 The Journal of Educational Psychology 


week and the class periods vary from forty-five minutes to one and g 
half hours. The unit has been in progress a little over four months, 

Because of the necessary regimentation and routine in institutional 
life the introduction of any new activity is usually accepted whole- 
heartedly. We know from past experiences with these girls that their 
interest range is very limited and that the interest span is short, 
Since they showed some interest in various forms of handwork, we 
decided to open up a craft room as an experiment. It was felt that 
the opportunities for creative activity would serve as a release for 
tensions and that a wider range of interests might be developed. 
There were many features about such a project that would contribute 
materially to the total training program and would outweigh any 
possible dangers, such as taking saws, pliers, hammers, and other 
tools from the room in order to facilitate escape from the institution. 

Factors That Make for Good Learning.—The first session in the craft 
room was both interesting and enlightening. The girls were quick to 
report that they knew nothing about handling tools, clay, reed, and 
other materials, and they were relieved when it was explained that 
this was not to be a class where everyone worked upon the same thing. 
Instead, it was to be a time when they could work on anything that 
was of interest to them and they would not be required to learn the 
use of tools unless they wanted to do so. This procedure was so new 
to this particular group that they were somewhat suspicious as to 
what might come forth. The general attitude seemed to be, “Oh 
well, it’s something different anyway, so I might as well take a shot 
at it.” Several sample objects, such as the novelty pins that were 
so popular at the time, name pins that were made out of wood, 
simple book-racks and book-ends, hot pads, coasters, ete., were passed 
around the group. Suggestions were made about things that could 
be made for their rooms, for birthday gifts and Christmas presents. 
Since Christmas was only a month off and money was scarce, the first 
deep interest was stimulated by satisfying a need. One of the impor- 
tant principles of learning is that the learner must see some connection 
between the thing being learned and his needs and desires, and the 
connection here was clear. The desire to give many gifts could be 
satisfied fully merely by learning to use some of the materials that 
were present. When this became clear to them, there was a readiness 
and a will to learn that carried them over some of the uninteresting 
details. 

As articles were comp! ted they were placed on an exhibit shelf or 
‘Christmas shelf’”’ as the girls called it. Within a few weeks there 








er — cf » 





Field Studies in Teaching Educational Psychology 531 


was an amazing array of articles, and in going over the work of a single 
girl it was possible to see the stages of improvement in workmanship. 
This is in line with the educational principle that improvement is 
more rapid when the learner comprehends his task and recognizes 
its harmony with his desires or needs. 

The girl is encouraged to follow her own interest in the craft room 
in the hope that her individual needs will be more fully met. These 
girls need satisfying experiences and there are many of them to be 
found in the craft room. There is the achieving of a purpose, the 
satisfaction that comes with the creating of a piece of work, the 
approval of the groups and the leader, as well as other members of 
the staff, and the satisfying of the desire for recognized progress in 
one’s work. 

The girls are committed to the institution for at least one year. 
Knowing that a girl will be in class that long makes it possible to spend 
as much time on a project as is necessary, and time is an important 
element in both learning and growth. The final results are never 
stressed because they matter much less than the method used to obtain 
them. This does not mean that goals have not been set up or that 
shoddy work is approved, but rather that learning to do a flawless 
piece of work is less important with this particular group than some 
of the concomitant learnings. For example, Marie made a simple 
stool to hold a flower pot. It was weak and wobbly and one of the 
legs needed a good deal of work with sandpaper, but Marie had so 
little patience that she could not sandpaper long at a time. She 
often gave up and threw the work aside. Gradually she began to 
admire well-finished wood and the desire to achieve it for herself made 
her more willing to pursue the task of sandpapering for longer periods 
of time. Marie needed to learn patience in other things also. Since 
there must be specific training for application if pupils are to apply 
what they learn, and since they learn what they practice, the leader 
worked closely with Marie over a period of three months. A sequence 
of projects was suggested that would call for patience. She was 
commended in craft and in other situations in the institution when 
she responded in the desired way. As meaning and relationship 
became clearer and as she was made aware of her progress, she was 
more and more able to apply what she had learned to a wider range 
of activities. 

It is rather simple to decide what learnings one would like to have 
carried over into other situations, and an honest effort may be made 
to facilitate the transference. The actual transfer cannot be achieved, 





532 The Journal of Educational Psychology 


however, unless the learner wants to learn and unless he is led to see 
the relationship. 

After four months in the craft activity we are aware that there 
has been a decided increase in the interest range. New interests 
have emerged and serve as motivation for still others. These interests 
have been stimulated by bringing to class objects that have been 
made by other groups, by making available books and pictures that 
can be used for ideas, by exhibits for the board members, and in 
various other ways. When interest begins to focus in any one direction 
it tends to become more specific, but it also has ramifications in other 
directions. For instance, Marjorie’s first interest was in keeping a 
scrap book showing interior views of rooms. The interest was quite 
general, any kind of room would do so long as it was in bright colors. 
That book was soon discarded in favor of one in which she placed only 
pictures that would go to make up a special room. That in turn 
led to an interest in furniture and to the making of miniature furniture 
in the craft room. Before she had completed two pieces she had 
decided to make a replica of her own bedroom. There is real meaning 
for her in this task and she is willing to spend any amount of time and 
effort in completing it. 

As new interests emerge new learnings are acquired. Many girls, 
who at first thought they would never want to use some of the tools, 
find now that to complete a project satisfactorily they must acquire 
some special skills. As the need arises these skills are introduced. 
Elsie had been using paint as it came from the jar—brilliant glaring 
reds, blues, and greens—for toys, but when she decided to paint 
flowers she wanted delicate pastel shades. It was necessary for her 
to learn how to mix paints in order to carry out her plan satisfactorily, 
and since she saw the relationship with her desires and needs, she 
was ready to learn how to mix paint. 

As new projects are suggested, an attempt is made to consider the 
stage of development already reached by the girl. The developmental 
skill levels in the group vary as markedly as the levels of intelligence. 
Some develop skills and techniques quickly and advance systematically 
to more difficult activities, while others remain for long periods of 
time on low levels of skill. Of course, there are many times when 4 
girl thinks she can do a piece of work that the leader knows will 
be too difficult for her, and this raises the question of whether it is 
wise to let her have several failures in order to convince her of her own 


level of development. 








EE | | ae 


Field Studies in Teaching Educational Psychology 533 


Just as there are differences in developmental levels, there are 
numerous other differences in the individual girls also, differences in 
intelligence, emotional make-up, abilities, and interests. In this 
group it is not expected that every girl will like the same things or 
that skills and accomplishments will be the same. Because of differ- 
ences in the group there are varied activities from which they may 
choose. They are encouraged to work at their own rate of speed and 
not to expect results that are the same as some other girl may get. 

Interaction is a total process, a transaction between the entire 
individual and his whole environment. In this particular group the 
above principle is extremely important not only from the learning 
angle but in understanding some of the interactive patterns. The 
whole environment does not mean merely the room in which the girls 
are working or the total institutional environment to which they are 
subjected, but includes all the influences that have come to bear 
upon the girls from any source. In other words, individuals do not 
respond to stimuli in isolation, but are always affected by them as 
they relate to their whole situation. When Betty is asked to change 
her seat because of unkind remarks to her neighbor, the effect upon 
her may not be finished by the time she carries out the request; 
feelings of resentment and anger are produced which influence her 
later behavior. A letter early in the day from home or from a proba- 
tion officer that disappoints or angers a girl can have a decided effect 
upon her responses in the evening. Knowing the problems that are a 
part of the girl’s total environment helps in understanding some of 
the interactive patterns which emerge to defeat learning or to facilitate 
it. 

Factors That Make for Poor Learning.—The general setting of the 
craft class is of a nature that is not conducive to good learning. There 
are poor lighting and ventilation arrangements and the space is not 
sufficient. These factors make for poor learning. There is not an 
adequate supply of materials and tools. Occasionally five or six girls 
will want to work in clay at the same time, but the available funds 
permit only limited buying of clay so that these interests have to be 
redirected. This is not in line with good teaching methods. The 
shortage of tools means that long periods of waiting are necessary 
and this leads to impatience which all too often ends in quarrelling 
or displays of temper. 

Learning is affected by the conditions of the room, by the presence 
of others, by the notice one takes of what others are doing, by recent. 








534 The Journal of Educational Psychology 


conflicts—in short, by the entire setting in which one finds himself. 
In any transaction between himself and the world about him the 
learner cannot isolate himself from any influences which play upon 
him. Since this is true, the social setting of the craft room offers 
many influences that do not favor effective learning. The girls are 
not permitted to talk with girls from another division of the institution 
except in class when it is considered necessary; yet, the craft class is 
composed of girls from two separate divisions. Immediately, for 
some, the interest shifts from craft projects to.conversation. Con- 
sidering the set-up in the institution, this is a natural and normal 
interactive pattern, but at the same time the leader is expected to 
see that no visiting is carried on between the groups. This artificial 
social situation hinders effective learning and encourages harmful 
learning. If the primary interest is to get a bit of choice gossip from a 
friend in another division, a girl may not be learning from the project 
on which she is working, but learns instead how to outwit the leader. 

The past experiences of these girls have not been of the kind that 
develop wholesome interests and curiosities. It is amazing to find 
adolescent girls so limited in interests, and apparently so unwilling 
to be stimulated. We know that good learning does not take place 
unless there is some interest or drive and unless there is present a 
readiness to learn. Sufficient stimulation has been lacking because 
of the nature of the group, but until some better way is found to make 
experiences more meaningful to the girls the will to learn will be little 
improved. 

Other factors that make for poor learning in this class are the 
desires to compete with others and to get results quickly at all costs. 
To finish a piece of work before Ruth does is the most important aim 
in Laura’s life right now. What the finished article will look like is 
second in importance. This is true in many cases and results in 
careless work and poor learning. 

In addition to the factors mentioned there is the obvious one that 
the situations mentioned earlier in the paper which contribute to 
effective learning are not applied to their fullest possibilities. Some 
are applied more than others and it is only through the knowledge 
gained in this course that improvements may be planned. 

Changes Necessary.—Probably the most beneficial change that 
could be made to facilitate greater and more effective learning would 
center around the general setting and social conditions. Plans are 
being made now to enlarge the room so that the overcrowded condi- 








Field Studies in Teaching Educational Psychology 535 


tion will be corrected. With a larger room the factors of light and 
ventilation can be remedied. Some arrangement should be made to 
have only the girls of one division in class at the same time. This 
would mean smaller groups where more individualized work could 
be carried on. 

Naturally, a more varied assortment of materials and more ade- 
quate equipment would be of tremendous help in the class. With 
the large number using the craft room, and with an extremely limited 
budget, it is almost impossible to keep supplies constantly on hand, 
and this causes interest to weaken and die out. 

Another point where considerable improvement is needed is in 
getting the girls to understand that people may be alike in some ways 
but that they may also be very different, that people who differ are 
not necessarily ‘‘queer.”’ The fact that so many of the girls are 
actually timid in admitting that they have interests and desires 
different from others in the group is due perhaps to the life in a correc- 
tional institution, where everyone is required to do the same thing 
at the same time and where group treatment is more common than 
individual treatment. In addition to this is the need for each girl 
to recognize her own level of development and not to feel inferior or 
throw temper tantrums when she discovers that a certain task is far 
too difficult for her. 

An effective change could be made in the realm of stimulation and 
motivation. This, however, is a very difficult problem with the group. 
The girls are necessarily confined to the house except on rare occasions 
and it is not surprising that they grow extremely tired of the environ- 
ment, so much so, in fact, that it takes an impressive or dramatic 
experience to snap them back into what we might call a normal 
reaction for adolescent girls. Occasionally we can furnish such an 
experience for a few of the girls and it always has a favorable effect. 
For example, Marjorie, who is making a replica of her own bedroom, 
was taken to see Mrs. Thorne’s miniature rooms at the Art Institute. 
She gained many new ideas and her interest in her own project was 
greatly stimulated. Two girls were taken to the flower show recently 
and they returned with new interests. They now are spending con- 
siderable time in mixing the desired shades of paint for a particular 
flower they saw at the exhibit. 

Many experiences, such as those above, are needed by all the 
girls. Since that cannot be arranged, the next best thing is to bring 
experiences to them. One way of doing this would be through the 








536 The Journal of Educational Psychology 


use of movies, pictures on arts and craft and educational pictures, 
Inviting interesting people to come in to talk with the group would 
provide some stimulation. An exhibit case in which good work could 
be displayed would stimulate the girl to exert her best effort in turning 
out a well-finished piece of work. Books, pictures, and work samples 
would serve a useful purpose in the stimulation of interest. 





DISCUSSION 


Any method which brings educational theories and actual experi- 
ences together offers many advantages in the teaching process. It 
assists in dispelling the tendency to value practice at the expense of 
theory when the fact is that students should learn that it is only poor 
theory which needs to be avoided. All adequate theories have modes 
of expression in corresponding experience. In the method of teaching 
used here students are advised to start either with educational prin- 
ciples or with actual experience, for by pushing either far enough 
both become more meaningful and significant. 

In this method of instruction the student often is motivated in a 
more effective way to study, observe, question and seek for meanings 
than in the traditional procedure even where principles are illustrated, 
usually from miscellaneous and unconnected experiences. In this 
use of field experiences the students are motivated both from the 
classroom and from the jobangle. Itisa matter of learning transcend- 
ing itself. One acquisition enriches another and so on. 

Any curriculum will suffer difficulty which offers a number of 
different courses, taught by different instructors, unconnected and 
uncorrelated in any real sense. The method of teaching as described 
here gives a student the chance to bring insights from all of his courses 
to bear upon any one experience and thus to counteract somewhat the 
effect of piecemeal courses. In the case of Miss Moore we find her 
using learnings from such courses as one on “‘Guidance,”’ another on 
“Group Work,” and another on “ Health Education.” This student 
is now working on the application of some of Moreno’s and Dimock’s 
techniques to the grouping problem at her institution. 

There are a number of specific learnings illustrated in Miss Moore’s 
paper in addition to some of the more general and background insights 
just discussed. The girls in this craft class gave evidence of impover- 
ishment in previous experience—little contact with tools, clay, and 
paints. This the instructor sought to remedy by the introduction 








ld 


1g 
eg 





Field Studies in Teaching Educational Psychology 537 


of the proper items and after a time she noticed the widening of 
interests and of attention span. Then just as the student-instructor 
was applying material from a college classroom so the girls began to 
see the connection between their class and their own rooms in their 
furnishings, and their own relatives and friends in the making of 
presents. 

This class offered an excellent chance for the observation of indi- 
vidual differences and the part they play in learning. The instructor 
not only noted these differences but attempted to modify her individual 
and group approach because of them. There is hardly any phase in 
the field of psychology about which educators have written more than 
about individual differences. Yet there are few considerations about 
which so few are skilled in the modification thereof. 

Finally, the instructor of this craft class showed insights about the 
nature of motivation so often overlooked. She pointed out the 
multiplicity of factors operating upon individuals in a learning situa- 
tion, many of which were not specific to the situation at hand. The 
skillful teacher is one who uses all of these transferred stimuli and 
who screens out the ones which operate negatively. 








PERFORMANCE OF MENTAL HOSPITAL PATIENTS 
ON THE WECHSLER-BELLEVUE AND THE 
REVISED STANFORD-BINET FORM L* 


MILDRED B. MITCHELL ‘ 
United States Naval Reserve . 
Differences as great as 20 or 30 points were sometimes found u 

in the 1Q’s of mental patients on the Wechsler-Bellevue and the 
Revised Stanford-Binet Form L. It was in an attempt to understand t 
these discrepancies that this study was undertaken. I 
é 


SUBJECTS 


The Wechsler-Bellevue and the Revised Stanford-Binet Form L 
were given to forty-one adolescent boys at Bellevue Hospital, New ) 
York City; thirty-two patients at the Psychopathic Hospital of the | 
State University of Iowa;7 ninety-two patients at the Mt. Pleasant 
State Hospital, Iowa; and one hundred three patients at the Inde- 
pendence State Hospital, lowa;f making a total of two hundred sixty- 
eight subjects. The subjects were mostly new admissions being 
examined routinely. Patients were excluded who were too uncoép- 
erative to give reliable results or whose mental condition changed 
obviously between the giving of the two tests. Since the tests were 
given usually within a few days of each other, only a few catatonics 
had to be excluded. Cases were selected in order to obtain adequate 
samplings at the higher and lower chronological-age levels as well 
as at the higher and lower mental-age levels. 

Although all the subjects were patients in mental hospitals, not 
all were psychotic. The boys at Bellevue were for the most part 
merely delinquent. Forty-seven of the Iowa patients were chronic 
alcoholics without psychosis. ‘These are included in the study as a 
whole, and are compared with the schizophrenics. There were too 
few of any other psychiatric classification to be considered separately. 





* This paper was read at the Dallas meeting of the American Association for 
the Advancement of Science, December 1941. 

+ The Wechsler-Bellevue was given here by Daniel Prager and Form L by 
Elizabeth Kuntz Hamstra. 

t Part of the Form L was given by Daniel Prager who worked up the data on 
the first one hundred fifty-five Iowa cases for a Master’s thesis at the State Uni- 
versity of Iowa in 1940. 


528 





1d 


id 


> 9 








Performance of Mental Hospital Patients 539 


METHOD 


The Revised Stanford-Binet Form L was given down to two 
successive levels of passes and up to two successive levels of failures 
or until the end of the scale was reached. The Wechsler-Bellevue 
vocabulary was generally given as a separate test, but was not used 
in figuring the 1Q and is not reported in this study. 

The Wechsler-Bellevue consists of ten separate tests (not including 
the vocabulary), five of which are verbal and five non-verbal. The 
IQ’s were figured for both the verbal and performance scales as well 
as for the test as a whole. 


RESULTS 


The mean IQ of forty-one adolescent boys at Bellevue was 82.3 
on the Revised Stanford-Binet Form L and 79.6 on the Wechsler- 
Bellevue. The mean IQ for two hundred twenty-seven Iowa patients 
was 87.0 on Form L and 89.4 on the Wechsler-Bellevue (Table I). 


TaBLE I.—MeEan, Sigma, MEDIAN AND RANGE FoR Two HuNnpDRED TwWENTYy- 
SEVEN Iowa PATIENTS ON THE REVISED STANFORD-BINET Form L, THE 
WecHSLER-BELLEVUE TOTAL, THE WECHSLER-BELLEVUE VERBAL SCALE 
AND THE WECHSLER-BELLEVUE PERFORMANCE SCALE 





oe Test Sse ___|_ Mean | SD | Median | Range 
ns na cadmas sataaed’ 87.0 | 26.52} 89 | 28-150 
Bellevue.........ccccccccccceeecceeeces | 89.4 | 19.57| 90 | 39-134 
a ele edt | 90.7 | 18.97; 90 | 52-132 
ESOT SCE | 89.4 | 19.19) 91 | 39-134 





The differences cannot be considered statistically significant, but we 
might note in passing that the younger group tested lower on the 
Bellevue and the older group tested higher on the Bellevue. We shall 
show later that this tendency is marked for the whole group of patients. 
The sigma for the Iowa patients was 26.52 on the Binet and 19.57 on 
the Bellevue. The mean on the verbal scale was 90.7 and on the 
performance scale 89.4. 

The highest correlation, .91, was found between the Binet and the 
verbal scale of the Bellevue (Table II). The correlation was nearly 
as high, .89, between the total Wechsler-Bellevue and Form L. The 
Binet correlated practically the same with the performance scale as 
the verbal scale did, the r’s were .804 and .806, respectively. 

Although the means are not significantly different and the correla- 
tions are high between the two tests, certain cases showed large 








540 


The Journal of Educational Psychology 


TaBLE II.—CorRELATIONS BETWEEN THE REvIsED STANFORD-BINET Form 
AND THE WECHSLER-BELLEVUE ToTAL, WECHSLER-BELLEVUE VERBAL AND 
WECHSLER-BELLEVUE PERFORMANCE FOR Two HuNpDRED TWENTYy- 
SEVEN Iowa PATIENTS 











Test | r Raits. 

eC eS 
I ag be wk eke nen | .89 28 
es bi wawcauweawk 91 .45 
EE ET eee See | .80 . 22 
stake masta Ls ee eee as / .8i 17 


Verbal vs. performance 





TaB Le I[].—Mean CA or ALL Patients, THosE Scorinc HIGHER ON THE Bryer, 
AND THOSE ScoRING HIGHER ON THE BELLEVUE GROUPED ACCORDING TO 
DIFFERENCES IN IQ BETWEEN THE Two TEsTs 















































All patients Binet higher |Bellevue higher’ 
IQ | Difference 
Diff Me: 
anaseoues N — Mean CA | N!} Mean CA! N |! Mean CA pation 
20+ 21 8 43.0 5) 26.6 16 48.2 21.6 
15-19 41} 15 38.5 16) 25.0 | 25) 47.1 22.1 
10-14 50; 18 34.0 25 28.3 25 39.7 11.4 
5-9 80; 30 31.5 43 28.7 37 34.6 5.9 
1-4 71; 28 30.5 39 29.9 32 31.8 1.9 
0 5 2 26.6 | 0 
Total......... 268 128 135 














differences (Table III). These occurred most frequently for the young 
adults and older adults. (The average CA for the two hundred sixty- 
eight patients was 33.6 years.) Twenty-one patients were found with 
differences of twenty or more points between their IQ’s on the two 
tests. Of these, sixteen scored higher on the Wechsler-Bellevue. 
Their average CA was 48.2,—nearly twice that of the five patients who 
scored twenty or more points higher on the Binet than on the Bellevue. 
The differences in CA are about the same for the forty-one patients 
scoring a difference between fifteen and nineteen IQ points on the two 
tests. Twenty-five of these scored higher on the Bellevue; their mean 
CA was 47.1. Sixteen scored higher on the Binet; their mean CA 
was only 25.0. The mean CA’s for all the patients get consistently 
greater as the mean differences in IQ increase. They range from 4 


mean CA of 26.6 for the five cases with zero difference in IQ to 43.0 
for the group with a difference of twenty or more points. 


The mean 





—_ ete es * _ 








1) 


Performance of Mental Hospital Patients 541 


difference in CA is less than two years for differences in 1Q of less 
than five points. For the groups with more than ten points difference 
in IQ, the mean CA is always much higher for the groups testing 
higher on the Bellevue than for those testing higher on the Binet. 
Since large differences in mean 1Q seemed to occur at the extremes 
of chronological age for our subjects, the differences in the mean IQ 
for each decade were figured (Table IV). In the thirties, the mean 
1Q’s are practically identical on the two tests. Younger patients 
tended to test higher on the Binet and older patients tended to test 
higher on the Bellevue. 
Taste [V.—MeEan IQ’s anv DiIFFERENCES IN IQ ror Two Hunprep Srxtr- 


EIGHT PATIENTS ON THE REVISED STANFORD-BINET Form L AND THE WECHSLER- 
BELLEVUE GROUPED ACCORDING TO CHRONOLOGICAL AGE 














' Mean IQ Mean IQ ; 

CA n Binet Bellevue Difference 
10-19 56 5.1 81.1 3.8 
20-29 56 91.4 87.4 4.0 
30-39 64 93.7 94.1 — .4 
40-49 51 80.1 87.3 — 7.2 
50-59 30 75.4 87.1 —11.7 
60-69 11 82.4 92.1 — 9.7 











The differences between the sigmas on the two tests, 26.52 for the 
Binet and 19.57 for the Bellevue, would suggest that the Binet would 
have more low scores and more high scores than the Bellevue. This 
is found to be true (Table V). Seventy-five patients scored below 70 
on the Binet, while only fifty-one scored below 70 on the Bellevue. 
Similarly, at the other end of the scale, thirty-one patients scored 
above 119 on the Binet while only fourteen scored above 119 on the 
Bellevue. When arranged according to IQ on the Binet, the differ- 
ences in the means ranged from —24 when the IQ’s were in the 20’s 
to +19 when the IQ’s were in the 150’s (Table V). The difference 
was only a fraction of a point when the IQ’s were in the 90’s. This 
phenomenon is not merely a regression due to arranging the I1Q’s 
according to the Binet, for the same type of phenomenon is found when 
they are arranged according to scores on the Bellevue (Part B of 
Table V). 

Of course, when working with psychotics, a low IQ does not neces- 
sarily mean that the patient is mentally defective. Since, however, 
the diagnosis of mental deficiency is frequently based largely on the 
IQ, any differences in IQ at the lower levels are of particular interest. 








542 The Journal of Educational Psychology 


For this reason, those testing below 70 IQ on either or both tests are 
now considered (Table VI). Forty-six patients tested below 70 on 
both tests; twenty-nine tested below 70 on the Binet but not on the 


TaBLE V.—Mean IQ’s AND MEAN DIFFERENCES IN IQ FoR Two Hunprep 
SIxTy-EIGHT PATIENTS ON THE REVISED STANFORD-BINET ForM L AND THE 
WECHSLER- BELLEVUE 
A. Grouped According to IQ on the Revised Stanford-Binet 














IQ on Form L N Mean IQ Form L omen Difference 
20-29 1 28.0 52.0 —24.0 
30-39 3 35.3 55.3 — 20.0 
40-49 16 45.1 62.0 —16.9 
50-59 27 54.7 65.8 —11.1 
60-69 28 64.4 69.7 — 5.3 
70-79 30 74.4 79.2 — 48 
80-89 36 84.0 86.9 — 2.9 
90-99 46 94.3 94.0 .3 

100-109 31 104.1 98.6 5.5 
110-119 19 113.4 109.6 3.8 
120-129 20 124.8 114.7 10.1 
130-139 7 134.1 118.4 15.7 
140-149 3 143.7 130.0 13.7 
150-159 1 150.0 | 131.0 19.0 











B. Grouped According to IQ on the Wechsler-Bellevue 
| 




















. Mean I ; 
IQ on Bellevue N Mean IQ Form L — Difference 

20-29 0 

30-39 1 67.0 39.0 28.0 
40-49 3 46.3 42.7 3.6 
50-59 19 50.8 55.9 —- §.1 
60-69 28 58.2 64.9 - 6.7 
70-79 43 68.2 74.6 — 6.4 
80-89 44 83.0 84.1 — 1.1 
90-99 49 91.7 94.8 — 3.1 
100-109 42 105.8 103.7 2.1 
110-119 25 118.0 113.7 4.3 
120-129 11 128.5 123.4 5.1 
130-139 3 146.3 131.7 14.6 





Bellevue; five tested below 70 on the Bellevue but not on the Binet. 
The mean CA for the group testing below 70 on both was 31.8—about 
average for all patients. The mean CA for the group testing below 





—_~_ maneiekweees luc FS 0 hClCUrSete-lC ee 





re 
on 
he 


ED 








Performance of Mental Hospital Patients 543 


70 on the Binet only was about thirteen years older, 7.e., 44.7. Eighty- 
three per cent of these were above the mean age for the entire group 
of patients. On the other hand, the mean CA of the few cases testing 
below 70 on the Bellevue only was much younger, 22.8 years. Thus 
here, with the low-grade groups, as with the patients as a whole, those 
chronologically older tend to test higher on the Bellevue and those 
chronologically younger tend to test higher on the Binet. 


TasLE VI.— Mean IQ, CHRONOLOGICAL AGE, Last GRADE REACHED, AND WorRDsS 
CoRRECT ON REVISED STANFORD-BINET VOCABULARY OF PATIENTS BELOW 70 
IQ on ONE OR Boru THE REVISED STANFORD-BINET AND THE WECSLER- 








BELLEVUE 
| | | 
| | Mean | Mean | | 
| | Edu- T _ 
Below 70 IQ on iv} 1@ | 1@ | ca | — pena 
| | Form L | Bellevue _ | y 
a | | it a ae meee 
Serer eee 46 | 52.7 59.0 31.8 | 6.1 | 8 
noes env cxe¥at (29) 59.0 | 76.6 44.7 | 6.3 | 12.3 
Bellevue only............. | 5 | 75.4 66.5 | 22.8 10.0 19.3 





It was felt that a study of the patients’ education and vocabulary 
scores might suggest differences in original ability or deterioration 
between the groups scoring low on one or both tests. The last grade 
reached in school and the Revised Stanford-Binet Vocabulary scores 
were available on most of the patients (Table VI). There is practically 
no difference in the mean grade reached for the patients scoring below 
70 on both tests and those scoring below 70 only on the Bellevue 
(grade 6.1 and grade 6.3, respectively). The mean vocabulary score 
was about two words higher for the group testing above 70 on the 
Bellevue, 12.3 in contrast to 10.5 for the group testing above 70 on 
both tests. The group testing higher on the Binet was definitely 
superior to the other two groups in education and vocabulary, mean 
education grade 10.0 and vocabulary 19.3 words. There were too 
few in this group to draw any conclusions, however. 

As mentioned earlier, the only two types of diagnosis occurring 
frequently enough in our population to warrant special consideration 
are chronic alcoholism without psychosis and schizophrenics (including 
all types of dementia praecox). The alcoholics average about ten 
years older chronologically than the schizophrenics—forty years for 
the former and thirty years for the latter (Table VII). This probably 
accounts for the fact that the mean 1Q for the alcoholics is slightly 
higher on the Wechsler-Bellevue than on Form L, while the mean 





544 The Journal of Educational Psychology 


IQ for the schizophrenics is slightly higher on Form L than on the 
Wechsler-Bellevue. The most significant differences are in comparing 
the results on the verbal and performance scales. The alcoholics 
score only 1.5 points higher on the verbal than on the performance 
scales, while the schizophrenics score 4.6 points higher on the verbal 
than on the performance scale.* This might be expected from the 
fact that the difference between verbal and performance tests has 
frequently been considered an indication of deterioration. 

TABLE VII.—ComPaARISON OF CHRONIC ALCOHOLICS AND SCHIZOPHRENICS ON THR 


REVISED STANFORD-BINET Form L AND THE WECHSLER-BELLEVUE (INCLUD- 
ING IQ ror THE ToTAL Test, VERBAL SCALE AND PERFORMANCE SCALE) 





Wechsler-Bellevue 

















Diagnosis N | CA | Form L p 
Total | Verbal sil 
formance 
Schizophrenics.............. 59 | 30.2 | 88.9 87.6 90.7 86.1 
ER wag cicawbaeesenwan 47 | 39.8 91.2 93.6 93.8 92.3 














SUMMARY AND CONCLUSIONS 


The Revised Stanford-Binet and the Wechsler-Bellevue were given 
to two hundred sixty-eight adolescent and adult patients in four 
institutions for mental cases. The mean IQ’s on the two tests were 
not found to be significantly different. The correlations were high. 
The IQ’s were found to be higher on the Binet for younger patients, 
higher on the Bellevue for older patients, and about the same for 
patients in their thirties. Patients of low intelligence tended to test 
lower on the Binet and brighter patients tended to test higher on the 
Binet than on the Bellevue. Patients near average in intelligence 
tended to test about the same on both tests. Schizophrenics showed a 
greater difference between their mean scores on the verbal and per- 
formance scales of the Bellevue than did the chronic alcoholics. 


REFERENCES 
1. Terman L. M. and Merrill, M. A.: Measuring Intelligence. Boston: Houghton- 


Mifflin, 1937, pp. xi + 460. 
2. Wechsler, D.: The Measurement of Adult Intelligence. Baltimore: Williams and 


Wilkins, 1941, pp. xi + 244. 





* However, a recent unpublished study made by the writer of approximately 
seventy-five cases using the Shipley-Hartford Retreat gave practically identical 
Conceptual Quotients for the chronic alcoholics and psychotics. 











REPRESENTATIVENESS OF COLLEGE STUDENTS 
WHO RECEIVE COUNSELING SERVICES* 


GWENDOLEN G. SCHNEIDLER AND RALPH F. BERDIE 


University Testing Bureau, University of Minnesota 


Counseling services, including procedures of clinical analysis, 
diagnosis, and treatment, are available on a voluntary basis to students 
at the University of Minnesota, not only within college advisory 
programs, but also on a university-wide basis. Since this is not a 
required aspect of the personnel program, one hundred per cent of 
the student body has never come for counseling services to the Uni- 
versity Testing Bureau, the all-university agency to which an enrolled 
student or anyone planning to enroll may apply for educational, 
vocational, or personal counseling. Although the proportion of stu- 
dents who avail themselves of this service has been constantly increas- 
ing, it still remains considerably less than the total number of students. 

The question exists regarding the extent to which those who come 
for counseling are different from or similar to the total student popula- 
tion. Frequently the assumption is made that persons receiving 
such special services must be “‘different.’”’ This attitude is probably 
carried over from the days when ‘“‘cases”’ were obviously significant 
deviates. 

Several purposes underlay this investigation to determine the 
characteristics and representativeness of students who were counseled 
at the University Testing Bureau during the academic year 1939-1940. 

Answers to these questions were wanted: Is the University Testing 
Bureau serving a representative sampling of students at the University 
or is it catering to a selected group? Is it spending its efforts on the 
academically maladjusted? Are they likely to be the personality 
deviates? Are these cases significantly different in regard to academic 
aptitude, academic information, personality traits, interest factors 
and high-school scholarship from the total college and class populations 
of which they are members? If so, in what direction is this difference 
manifest? Does the counseling aspect of the personnel program 
function mainly for college misfits? What types of students receive 


* Grateful acknowledgment is made to Dr. John G. Darley for helpful sug- 
gestions. 

Assistance in the preparation of these statistics was furnished by the personnel 
of Work Progress Administration, Official Project No. 165-1-71-124, State Project 
No. 3-1805-40145 Sub-Project No. 370. 

545 


546 The Journal of Educational Psychology 


vocational, educational, and personal counseling at the Testing 
Bureau? 

Answers to such questions are important for several reasons. They 
have definite administrative significance. Educators should be 
interested to know, for example, whether they are supporting personnel 
programs which facilitate the adjustment of good students or whether 
they are by these means attempting to keep the low ability groups 
in school. Students’ attitudes are also influenced by the stereotypes 
they have regarding the characteristics of students coming to such an 
agency. The types of personnel workers employed in a personnel 
program and the techniques they use are also determined by the 
characteristics of the students served. 

In addition to the administrative value of this information is its 
significance for future research. If students with extensive case 
records on file are typical of their college and class groups, then such 
existing clinical data become more valuable and have broader useful- 
ness in terms of research. These data are extremely significant 
from the standpoint of the information they can furnish in numerous 
types of investigations in student personnel research. ‘The establish- 
ments of test norms, for example, is only one research area which 
might be served. Clinical hypotheses also depend upon the type of 
student with which the personnel worker comes in contact. 

Such information could be used to describe the characteristics of 
larger student populations in terms of the extensive clinical data 
available for the smaller sample. Valuable information could thus 
be made available regarding the student body as a whole—the typical 
problems they have, their needs, interests, abilities, ambitions and 
plans. If generalizations could be made to total populations from 
small samples studied intensively, the instructional and administrative 
staffs would be provided with much more complete and accurate 
pictures of their ‘‘consumers” and thus would be in a better position 
to provide curricula and extra-curricular activities adapted to meet 
their particular needs and characteristics. One of the few systematic 
studies of this type was done in the General College of the University 
of Minnesota.! 


METHOD 


This study has compared, on the basis of certain variables, the 
students from various colleges and classes who came to the University 
Testing Bureau with the total college and class groups from which 














Representativeness of Students Who Receive Counseling Services 547 


the counseled cases were drawn. The critical ratios obtained by 
comparing means furnish an indication of the extent to which the 
University Testing Bureau samples are representative of the total 
populations from which they come. 

Tests of scholastic aptitude and English achievement are given 
annually to all seniors graduating from Minnesota high schools. 
Scores on these tests are assembled for each student entering one of 
the State colleges and lists of students’ ratings, including the high- 
school rank, are constructed. Population statistics for freshmen 
were obtained from these lists. 

Freshmen in the various colleges who came to the University 
Testing Bureau were compared on the American Council on Education 
Psychological Examination scores, Coéperative English Test scores 
and high-school percentiles with the total groups of freshmen of the 
same year who were entering the same colleges. The same procedure 
was followed in comparing upper classmen coming to the University 
Testing Bureau to the college and class groups upon the basis of these 
entrance ratings. 

Several forms of the American Council on Education Psychological 
Examination have been used at the University and all scores were 
equated to scores on the 1937 form through the use of tables* supplied 
by the Codperative Test Service. All scores on the Codperative 
English Test were equated to scores on the 1938 form OM. 


RESULTS 


Table I shows the critical ratios obtained when the University 
Testing Bureau freshmen are compared to the all-university freshmén 
on these entrance data. The critical ratios (all below 3.0) indicate that 
there are no significant differences between University Testing Bureau 
and all-university freshmen in respect to high-school scholarship, 
aptitude for college work, or knowledge of the mechanics of English. f 





* These tables were obtained by giving several forms of the same test to the 
Same group of students and determining the raw scores on the different forms 
corresponding to the same percentile score. The justification for this method 
depends on high correlations among the tests involved. The correlations among 
different forms of both the American Council on Education Psychological Exami- 
nation and the Coéperative English Test are all very high. 

+ Pearson’s method of determining the significance of the difference between 
the mean of a subgroup and the mean of the total group of which the subgroup 
is a part was applied, but found to give consistently the same results as the cus- 
tomary critical ratio—Difference + SE of Difference. 





548 The Journal of Educational Psychology 


Table I also shows the comparison between University Testing 
Bureau sophomores and juniors and comparable total college popula- 
tions on the same entrance data. University Testing Bureau sopho- 
mores from three colleges (Science, Literature and the Arts; Institute 
of Technology; and Agriculture, Forestry and Home Economics) were 
compared on the same three measures with all sophomores registered 
in those colleges. Juniors from two colleges (Science, Literature 
and the Arts and the School] of Business Administration) were com- 
pared on these three measures with all juniors registered in those col- 
leges. As the Testing Bureau case load consists mainly of freshmen 
and sophomores, the number of cases among the upper classmen was 
often too small to include here. All but one of the critical ratios is 
insignificant. The small number of cases of Science, Literature, and the 
Arts junior women who had come to the University Testing Bureau 
and who were inferior to junior Science, Literature, and the Arts 
women in general tends to make that single difference unimportant. 
We can say that upper classmen coming to the University Testing 
Bureau are nearly as much like their total college populations in 
respect to entrance data as are the freshmen, although the small dif- 
ferences are not in favor of those coming to the Bureau. 

A second part of this investigation involved a more intensive com- 
parison of University Testing Bureau freshmen registered in the 
College of Science, Literature, and the Arts with freshmen in that 
college who had not been served at the University Testing Bureau. 
The following variables* were involved: Achievement as measured 
by the Codéperative achievement tests in mathematics, natural 
sciences, and social sciences; the Minnesota Personality Scale (yielding 
five scores—I, morale; II, social adjustment; III, family adjustment; 
IV, emotionality; and V, economic conservatism); the Strong Voca- 
tional Interest Blank with thirty-four occupational keys for men and 
sixteen occupational keys for women plus ratings for “‘masculinity- 
feminity” of interests and “‘occupational level”’ of interests. 





* These tests were given to five hundred seventy-seven freshmen men and 
five hundred fifty-seven freshmen women in the College of Science, Literature, and 
Arts in the Fall of 1939 as part of an experimental program. Approximately 
eighty per cent of the total freshman class took these tests. These comparisons 
are made between students who came to the Testing Bureau and students who did 
not come in the year 1939-1940. The previous comparisons were made between 
students coming to the Testing Bureau and the total college classes from which 
they were drawn, inclusive of those coming to the Bureau. 





“19qB1Y SVM FLA OY} JO UBOU 94} SezTOIPUT OLFBI [BOIGIID OY} OIOJOg UBS + » 


549 
























































































































































92 1@z | A Uv eer OLP 62F | OV 
o e9° |\It loz’ |Il aLo ue ssoulsng soune ith" +/2Z [12° +/16 r6° 8 aL 1830. 
S&S — | | lee zie |All ESI 891 est | OV 
e 49°2|2% ch |b2 | ALN 1810,], Lb iL = 8hT «68% | TL” iS@_s| ALN | UOMO 
AH g9I FOL | OUV sy 082% Z0€ 242 | NUV 
S9°FIOL [Zz° T/T GLA | Yeuroy | puv ony 821+ 02 ; 201+ \€9 8st" 6° aLn us! jesouer) | uvuUlYyselg 
> 6SI SbI IN UV -B199V'] 991 Z9l cot | aAllv 
3 ez°zizt jor’ ltt | ala us| ‘Q0UdTIg sone [¢9° 02 |49° +)1Z ro" GT | ALN | UOMIOM | soru0u007q 
= ebe F0S 2b | AI prem. Lia 
s P . ° Oo 
S loll jie tee |29" |9€ | ALA | MOL | nuoucog ; a 
> 661 19% zez | UV quiopy pus out eel Sct | O Uv 8 
. ° ‘ cc '’zZ ud -[nougZ uevulysel, 
> ot [2 jortist fez titt | ato | wmom | -cryso104 be" roves ite | Se erie | aL 8 es me A Mc ante 
3 FFI EbZ Fez | NV ‘oiny | = e4our 88 68 88 | AliV 
S zortjer ies: ist jeszice jan | ww] -mowsy| -oudostyg-1 ler ez jet | zit jer |arn| ww |uonvonpq| uewysorg 
Re = AZojou AZojou 
o 1ge L8¥ 6z¢ | A IV “Yoo, Jo 819 80° 999 | OIIV -YoaT, Jo 
<= €1° [8 [98° [oO |zO'I|Fe | ALA | WW] snansuy) -ogdogigg” +/pz1 [1O°I+\FIT | SFT |zzt | aA ue | eyngysuy| uvurysery 
= £est O861 S281); AV SLtl PLEI SZFIO UV 
” FF {LZZI iZS° (20% (8h Z8t | aL 1839.1. €l'1+/6LF |98° PSh | FO" 26F | ALO LA F 
[eee ek a A , 
= 182 ZZE 082 | A UV syIy £0L L69 $99 | OV sy 
3 19° |I¢ jeg" joe jez* [12 | ALO | MUON | pus ong C'Z+/8zz |98'T |6zz |¥98° +/22 | GLA | YUM | pus ony 
> ZgOl SST 8601) 1 IV “619377 Os0u CLL LLL 092 | NIV “B19 
2 6o'tioz joe’ jozt [te tltIt | aa uay | ‘eousg) -oydogige’ |i¢ez |zo° +\¢cz | €e° |s9z | aa usy, | ‘eousg| uvuYysery 
~ cname mo zs 
S UO; NYO! N |UO| N YO |N|UO|N] WO} N 
8 
~ 1891 wor 919% Idnoin| xag 8891] re) 480], uoH dnoin| xeg 8B91]0D 88810 
> ysyzuq |-vuruexe | -usdied outs ean -BuIWIUXO e[t}us0sed 
2 eAny jwunog | jooyos q est " jounog =| jooyos-ysryy 
= -819dQ0H |uvoweuy!] -YysIFy 209 uBoOUly 
SS) 
2 ISG], HSYIONG FTALLVAAMQOD ANV NOLLVNINVX]] TVOIDOTOHOASY TIONDOD 
a NVOIMGNY “HILNGOUGT IOOHOS-HDIf{ NO SASSV1() GNV SADATION ASOH], NI GAUALSIDAY SLNAGOLG ALISUBAING 4O 
= 400UD IVLO], GHL OL GGUVIWNO)D SSVI) GNV GDATIOND Ad GAIdISSVID ‘SBLNGGNIg AVauUNg ONILSA], ALISUBAINQ—'] @IavV 


| «£4 tDa th a> =m 


re, . 








550 The Journal of Educational Psychology 


The comparisons are shown in Table II and indicate that, in gen- 
eral, Science, Literature, and the Arts freshmen students who came to 
the Testing Bureau for counseling are quite similar, in respect to these 
other variables, to the Science, Literature, and the Arts freshmen stu- 
dents who did not come. In measured achievement, measured per- 
sonality traits and measured interests, * there is little difference on the 
average between the freshmen Science, Literature, and the Arts stu- 
dents who came to be counseled and their classmates who did not 


come. 


TaBLeE I].—Comparison or UTB Cases anp Non-UTB Cases ror §.L.A. 
FRESHMEN, 1939-1940, oN THE PERSONALITY SCALE, MATHEMATICS ACHIEVE- 
MENT Test, SociaL SclIENCES ACHIEVEMENT TEsT, NATURAL SCIENCES 
ACHIEVEMENT TEST, AND SELECTED StroNG INTEREST INVENTORY 














Keys 
Test Sex Group N Critical® Test Sex Group N Critical® 
ratio ratio 
Personality Men UTB 204 Mathematics| Men UTB 204 
scale I Non-UTB |373} +1.00 Non-UTB |373) +2.70 
Women UTB 203 Women UTB 203 
Non-UTB |354 .38 Non-UTB /354| +2.66 
Personality Men UTB 204 Natural Men UTB 204 
scale II Non-UTB /|373} +1.02 sciences Non-UTB /373 .32 
Women UTB 203 Women UTB 203 
Non-UTB /354 .33 Non-UTB /354| +1.21 
Personality Men UTB 204 Social Men UTB 204 
scale III Non-UTB /|373 1.06 sciences Non-UTB /|373;} +1.08 
Women UTB 203 Women UTB 203 
Non-UTB /354 1.79 Non-UTB /|354| +1.87 
Personality Men UTB 204 Occupational} Men UTB 204 
scale IV Non-UTB /373} +1.45 level Non-UTB /|373 1.60 
Women UTB 203 Masculinity-| Men UTB 204 
Non-UTB /|354 31 feminity Non-UTB /|373| + .86 
Personality Men UTB 204 Women UTB 203 
scale V Non-UTB |373 2.18 Non-UTB [{349} + .03 
Women UTB 203 
Non-UTB /|354 2.37 
































* + signs before the Critical Ratio indicate the mean of the UTB group was higher. 


It is planned to collect data on the college scholarship of these 
two groups to determine for this variable also whether or not the stu- 
dents who seek the counseling services are typical. 





* Statistics are not presented for the occupational keys of the Strong Vocational 
Interest test. ‘These comparisons did not show significant differences between the 


groups. 











Representativeness of Students Who Receive Counseling Services 551 


CONCLUSIONS 


Students from the various colleges and classes of the University 
of Minnesota who come to the University Testing Bureau for counsel- 
ing do not differ significantly from their classmates in respect to 
aptitude for college work, high-school scholarship, or achievement 
in English. 

For a special group of Science, Literature, and the Arts freshmen, 
it was found that those who came to the University Testing Bureau 
did not differ from those who did not come in respect to these same 
variables and in addition did not differ in respect to achievement in 
natural sciences, social studies and mathematics. They resembled 
each other in measured personality traits such as morale; social, 
family, and emotional adjustment; and economic conservatism. 
They were alike in respect to measured interests, masculinity or 
feminity of interests, and “‘occupational level.’’ 

These findings do not allow too great generalizations, but they 
give considerable assurance that the students coming to the University 
Testing Bureau for counseling are not a distinctly atypical group. 
The use of extensive clinical data collected from samples appears 
justified in attempting to arrive at a picture of the characteristics of 
students in general who are from the same populations. This particu- 
lar student personnel agency is serving students who, on the basis 
of the variables studied, can not be distinguished from the University 
populations of which they are a part. 


BIBLIOGRAPHY 


(1) Darley, John G. and Williams, Cornelia: “The General College Personnel 
Service and Personnel Research Program,” Part 1, pp. 4-170 in Report on 
Problems and Progress of the General College, University of Minnesota. 
University of Minnesota Mimeograph Department, 1939. 

(2) Schneidler, Gwendolen G. and Berdie, Ralph F.: ‘‘Educational Ability 
Patterns.” Journal of Educational Psychology, Feb. 1942, pp. 92-104, vol. 
33. 

(3) Schneidler, Gwendolen G. and Berdie, Ralph F.: ‘‘ Educational Hierarchies 
and Scholastic Survival,’”’ Journal of Educational Psychology, March, 1942, 
pp. 199-208, vol. 33. 





INFLUENCE OF LINE WIDTH ON EYE MOVEMENTS 
FOR SIX-POINT TYPE! 


DONALD G. PATERSON AND MILES A. TINKER 


University of Minnesota 


By the use of reading performance tests we found that a very short 
line width (seven picas) and an excessively long line width (thirty-six 
picas) produced for six point type an important retardation in speed 
of reading in comparison with an optimum line width of about fourteen 
picas.? 

In order to determine the specific patterns of eye movements 
involved in reading optimal versus non-optimal line widths for six- 
point type set solid, we used the Minnesota eye-movement camera. 
Samples of the stimulus material used are shown in Fig. 1. 


TaBLE I.—MeEaN EYE-MOVEMENT MEASURES FOR TWENTY COLLEGE STUDENTs 
Reading Six-Point Text Set in Thirteen-Pica and Five-Pica Line Widths 
Note: Pause duration and preception time are reported in seconds. Ten 
paragraphs from the Chapman-Cook Speed of Reading Test, Form A, were set in 
a thirteen-pica line width and ten paragraphs from Form B of the same test were 
set in a five-pica line width. All paragraphs were printed as follows: Scotch 
Roman, lower case, six point, set solid on egg-shell paper stock. 























, ‘ Fixation | Words per Pause Perception | Regression 
Line width : S 
frequency | fixation duration time frequency 
Ee 213.0 1.46 0.23 50.3 28.3 
ee 235.0 1.33 0.27 62.3 14.0 
Differences!......... + 22.0 —0.13 + 0.04 +12.0 —14.3 
Per cent difference...| + 10.3 —8.9 +14.3 +24.4 —50.4 





1 All differences are statistically significant beyond the one per cent level. 


The photographic records yielded the following analytical meas- 
ures: Number of fixations, number of words per fixation, duration of 
pauses, perception time and number of regressions. 

The results are shown in Tables I and II together with a statement 
of the printing arrangements used. 





1 The writers are grateful to the University of Minnesota Graduate School for 
research grant to finance this study. 
* Paterson, D. G. and Tinker, M. A.: How to Make Type Readable. 
Harper and Brothers, 1940, pp. 78-79. 


552 


New York: 





553 


‘9d Ay JuIod XIs JO} SY}PIA OUT] 9014} Jo o[duUI¥S ZUIMOYG—'| “DIg 


-891dx0 snolies Al29A 8 BBY SABM[B O10y IB9U AZIO B UI BulAll 103;00p ulezIe0 Y “ZI = ‘e[qrssod ev Alyoinb 
8B 31 O38 OY O[/GB} UeyO}Ty OY} UO FI PUNO; OY SB UOOS S¥ OS ‘SABP [BIZAVs 10j 19 }OIg BIG WOl 19330] BV BuMoedxe useq py yuBIy “{] 


svold g¢ :oull Juo'yT 


100 V ‘St ‘Iq 
-1ssod se AyxHornb 
SB I 938 OY a/qQ8} 
usqoyIy oyy UO 47 
punoj ey 8B Uu0cdOs 
sv os ‘sABp [Ble 
-A9s 10} J9qy}01q 
sty wo 19999] 
6 Bunosdxe useq 
peg = yuviy “TT 


sBold ¢ :aul] y1OY4g 


10}00P UIBII00 Y “ZI ‘e1qe 
-tssod su Ajyqornb 8B 41 038 ey 9[qGB} Ueqo}Ty OY} UO 
9t PUNOs oY SY OOS BU O8 :SAUP [B19A08 10) Jey OIG 
SIY WO1j 10330] B Burjoedxe useq pey quBiy “[] 


svuold ¢] :piBpuyig 


Influence of Line Width on Eye Movements 





rm) = _ | — 





554 The Journal of Educational Psychology 


THIRTEEN PICAS VERSUS FIVE PICAS 


Examination of the results in Table I reveals striking differences 
in all of the analytical eye-movement measures. In only one respect 
(regression frequency) does the excessively short line have an advan- 
tage. For the five-pica line width, fixation frequency is increased, 
the number of words per fixation is decreased, the pause duration js 
lengthened and perception time is greatly increased. These results 
parallel our eye-movement findings for the reading of ten-point type 
in an optimal line width in comparison with reading a relatively short 
line.' The interpretation offered in the ten-point type study just 
cited would appear to apply to the present study. This interpretation 
stressed the reader’s inability to make maximum use of horizontal 
peripheral cues in reading an unsually short line of print. 

TaBLE IJ].—MEAN EYE-MOVEMENT MEASURES FOR TWENTY COLLEGE STUDENTS 
Reading Six-Point Text Set in Thirteen-Pica and Thirty-six-Pica Line Widths 

Note: Pause duration and perception time are reported in seconds. Ten para- 
graphs from the Chapman-Cook Speed of Reading Test, Form A, were set in a 
thirteen-pica line width and ten paragraphs from Form B of the same test were 


set in a thirty-six-pica line width. All paragraphs were printed as follows: Scotch 
Roman, lower case, six point, set solid on egg-shell paper stock. 














, . Fixation | Words per Pause Perception | Regression 
Line width ; . : 
frequency | fixation duration time frequency 
eer 210.4 1.44 0.23 48.6 22.0 
36 picas............. 215.9 1.42 0.24 51.8 36.8 
Differences!......... + 65.5 — .02 +0.01 + 3.2 +14.8 
Per cent difference...| + 2.6 —1.5 | +3.9 + 6.6 +67.7 














1 Differences for regression frequency and pause duration are satistically signifi- 
cant beyond the one per cent level; the difference in perception time is significant 
at the 2 per cent level; differences in fixation frequency and words per fixation are 
not significant (30 per cent level and 50 per cent level, respectively). 


THIRTEEN PICAS VERSUS THIRTY-SIX PICAS 


Scrutiny of the results in Table II shows that the biggest difference 
in reading the excessively long line is found in the number of regres- 
sions. Pause duration is also lengthened as is perception time. There 
are, however, no significant differences in fixation frequency and 
words per fixation. The direction of all of the differences again 
parallels the results we found for ten point type.! As in the former 





1 Paterson, D. G. and Tinker, M. A.: ‘Influence of Line Width on Eye Move- 
ments.” J. Exper. Psychol., Vol. xxv, 1940, pp. 572-577. 





sti 
ex 
nl 








Influence of Line Width on Eye Movements 555 


study, we interpret the results to mean that the eyes, in reading an 
excessively long line, are inaccurate in locating accurately the begin- 
ning of each new line. 


SUMMARY 


(1) Eye-movement photographs of two groups of twenty subjects 
each indicate the changes in oculo-motor patterns involved in reading 
excessively short and long line widths for six-point type. 

(2) The retardation in rate of reading an excessively short line 
appears to be due to an inability to make maximum use of horizontal 
peripheral cues. 

(3) Although an excessively long line is read less efficiently than 
a moderate line width, the difficulty seems to be primarily due to the 
inability of the eyes to locate accurately the beginning of successive 


lines of print. 








BOOK REVIEWS 


Joun G. Darter. Clinical Aspects and Interpretation of the Strong 
Vocational Interest Blank, New York: Psychological Corporation, 
1941, pp. 72. 


Unlike tools and instruments used in other fields, testing devices 
are seldom presented with specific suggestions for their use. In an 
attempt to remedy this unfortunate circumstance with reference to 
one instrument, Darley presents in his monograph, Clinical Aspects 
and Interpretation of the Strong Vocational Interest Blank, a summary 
of clinical experience with the Strong Vocational Interest Blank as a 
part of the total diagnostic process. This well-written little book 
should serve as an informative source not only for the beginning 
student in this branch of psychology but also for the more experienced 
clinician. Opening his first chapter with a brief but clear description 
of the nature of the Interest Blank, Darley presents the basic facts 
regarding its content, the manner of its construction, the ratings 
achieved from it, and factors to be considered in its use. 

The focus of the discussion of the practical application of the Inter- 
est Blank lies in the idea of interest ‘“‘types” or patterns. Darley’s 
work on factor analysis of the men’s blank resulted in six broad 
classifications of interest: Technical, verbal, business contact, welfare, 
business detail, and certified public accountant. <A factor analysis 
of the women’s blank resulted in the following patterns: Technical, 
verbal, business contact, welfare, and non-professional interests. 
He states, however, that women’s interests are ‘‘less channelized or 
less professionally intense”’ than men’s and that it is often advisable 
to administer the men’s blank also to strongly motivated ‘‘career” 
women. 

For those desiring to make practical use of the Blank in terms of 
these interest types, Chapter II gives more specific information on 
the use of this concept. The analysis of interest scores into types 
or patterns is essentially the presentation of the scores in a logical 
and meaningful order, which aids in synthesizing and interpreting 
all the single scores. In this connection, primary, secondary, and 
tertiary patterns of occupational interest are defined. 

Sound advice is provided those desiring to do actual counselling 
on vocational interest problems in a list of specific ‘‘do’s’’ and ‘‘dont’s”’ 
with helpful explanatory discussions. Darley finds that in vocational 
interviews it is far more effective in most cases to work ‘through 


556 





— 


ee ee ee 








—— Ee os oe ae we 


=—= 








Book Reviews 557 


the pattern analysis procedures and eventually arrive at the specific 
high letter grade, as counselling progresses.”” The author’s sugges- 
tions are given more concrete form in the presentation of a number of 
well-chosen illustrative cases; he also treats the case with no primary 
interest pattern. 

Studies on non-occupational determinants of interest types reveal 
no marked relationship between interest scores and Strong’s scales 
for interest-maturity, masculinity-femininity, and occupational level. 
However, another study, using the Minnesota Scale for Survey of 
Opinions, the Bell Adjustment Inventory, and the Minnesota Inven- 
tory of Social Attitudes, showed a clearer definition of personality 
traits in relation to interest types. 

Darley’s concluding contribution in this monograph is a challeng- 
ing list of problems that remain to be investigated in the field. Psy- 
chologists should find this work a most worth while and useful addition 
to the literature on the topic of interest measurement. 

BERTHA PETERSON HARPER. 


University of Rochester. 


Froyp C. DockEerAy. Psychology. New York: Prentice-Hall, 1942, 
pp. 504. 


“Tt is the aim of this text first to help the student to understand 
human behavior, and second to encourage him to practice those 
principles which apply to his own case.’’ The attempt to present the 
material in the language of the student gives the impression at times 
of over-simplification, especially in the avoidance of certain technical 
terms which are the language of psychology. The viewpoint is strictly 
a stimulus-response one. Constant emphasis is placed upon scientific 
results versus common sense views. Both the biological and the social 
factors in behavior are stressed. 

There is a commendable departure from traditional catagories in 
organization of the discussion. Certain aspects of the organization, 
however, seem inadequate. Thus discussion of receptor mechanisms 
is far separated from sensory perception. Percentages of space 
devoted to certain topics are: Biological and physiological bases, 
nineteen; motivation and morale, nine; sensation and perception, 
ten; thinking, ten; personality, ten; feeling and emotion, seven; 
learning and remembering, sixteen. Possibly too little emphasis has 
been given to certain subjects such as color vision, relation of the 





558 The Journal of Educational Psychology 


nervous system to emotion, individual differences, integration of 
personality, and explanations of diagrams. There is a one-sided 
treatment of racial differences, and an inadequate discussion of 
causes of abnormality. 

Attention may be given to certain minor details: (1) Range of 
apprenhension is wrongly termed range of attention. (2) Threshold 
of feeling in Figure 65 is not in line with recent findings. (3) Footnote 
2, page 29, should be .000001. (4) Range of hearing is given as 20 
to 22,000 d.v. on page 33 and as 20 to 20,000 on page 38. (5) Paterson 
is spelled incorrectly on page 412. (6) Overestimation of acute 
angles and underestimation of obtuse angles (p. 237) cannot be used 
as an explanation of optical illusions. (7) The description of the 
device for photographing eye movements (p. 250) is in error. (8) 
While the typography of the text is adequate in general, it is unfor- 
tunate that all-capital printing is used for emphasis for this is illegible 
and is disliked by readers. 

There are many things to commend in this text. A few may be 
cited: (1) The section on heredity and environment is well balanced 
and sound. (2) Maturation is given adequate treatment. (3) 
Organization of material on learning and forgetting is well done. 
(4) The chapters on thinking form an outstanding part of the book. 
(5) The section on perceiving is well done and adequate. (6) There 
is a consistent objective viewpoint throughout the book. (7) Pointing 
out applications to everyday activities should be helpful to students. 

Although some instructors may object to parts of the discussions, 
others will find the text teachable and much to their liking. 

Mixes A. TINKER. 


University of Minnesota. 


Norma E. Cutts anp Nicuoitas MosE.tey. Practical School Disci- 
pline and Mental Hygiene. Cambridge: Houghton Mifflin Com- 
pany, 1941, pp. 324. 


Concerned with everyday discipline problems of the schoolroom 
and their solution, this book was written ‘‘to answer the questions 
which experienced teachers, students in teachers’ colleges and parents 
of school children are constantly asking about what the school should 
do to discipline children.”” Some of the chapter headings are “ Pre- 
vention of Disorder in the Classroom,” ‘‘ Punishments: Their Use and 
Abuse,” “ Assistance From Experts,” and “The Teacher and Parents.” 





CO 
ar 








» of 
old 
ote 
20 
On 
ite 
ed 
he 


)r- 
le 





Book Reviews 559 


The authors have written a sane, highly practical text, based on 
concrete classroom situations. Recommendations for the prevention 
and treatment of specific problems are included. 

JAMES D. PAGE. 


Temple University. 


ERNEST W. BurGess, W. LLoyp WARNER, ALEXANDER FRANZ, AND 
MarGARET MeEaApE. Environment and Education. Supplemen- 
tary Educational Monographs, No. 54, Chicago: University of 
Chicago Press, 1942, pp. 66. 


This series of lectures delivered as part of the Fiftieth Anniversary 
Celebration of the University of Chicago emphasizes the educative 
possibilities of various aspects of the environment. Mr. Burgess deals 
ith the urban environment, Mr. Warner with social status, Dr. 
Alexander with personality factors in the environment, and Miss 
Meade with the social environment as disclosed by studies of primi- 
tive society. Each of the lectures is a separate entity and apart from 
the fact that all deal with the same general topic, no attempt was made 
to weave them into a coherent and interrelated argument. 

The nearest approach to a controversy appeared in the papers of 
Miss Meade and Dr. Alexander. The latter added five pages of 
remarks to his lecture in an attempt to justify his contention that the 
same personality types can be found in different cultures. The infer- 
ence then was that ‘‘some factors other than cultural ones are at work 
in their production,” (page 63). To the reviewer, and to Miss Meade, 
this question hinges upon a prior one; namely, how different must two 
cultures be before it can be claimed that similar personality types 
appearing in each of them do not result primarily from environmental 
influences? It is difficult to believe, despite Dr. Alexander’s exposi- 
tion, that the American and European cultures, for example, differ 
sufficiently to justify the conclusion that the appearance of similar 
types of personality in each of them must be explained in noncultural 
terms. STEPHAN M. Corey. 


University of Chicago. 


M. Ezexier. Methods of Correlational Analysis. (Second Ed.) 
New York: John Wiley and Sons, pp. 530. 


This is a rather extensive revision of the author’s earlier (1930) 
book of the same title. Although written primarily for the economist, 








560 The Journal of Educational Psychology 


the psychologist who wishes in his study of relationships to go beyond 

the ordinary correlation coefficient will find this book extremely 

worth while. It is, in the reviewer’s opinion, the best treatise on cor- 

relational analysis available. QuInN McNeEmar. 
Social Science Research Council. 


PUBLICATIONS RECEIVED 


HaROLp Saxe TuTtLte. How Motives are Educated. Ann Arbor, Mich.: 
Edwards Bros., Inc., 1941, pp. 205. (paper.) 

Serce Voronorr. From Cretin to Genius. New York: Alliance Book 
Corporation, 1941, pp. 281. 

JoHn W. M. Wuitina. Becoming a Kwoma: Teaching and Learning in a 
New Guinea Tribe. New Haven: Yale University Press, 1941, pp. 226. 

C. GILBERT WRENN AND D. L. Harutey. Time on Their Hands: A Report 
on Leisure, Recreation and Young People. Washington: American Council on 
Education, 1941, pp. 266. 








nd 
ly 





